Factor \(A = QR\), multiply by \(Q\), generate \(Q\). More...

Functions
template<typename scalar_t >
void	slate::cholqr (Matrix< scalar_t > &A, Matrix< scalar_t > &R, Options const &opts)
	Distributed parallel Cholesky QR factorization.

template<typename scalar_t >
void	slate::geqrf (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Options const &opts)
	Distributed parallel QR factorization.

template<typename scalar_t >
void	slate::unmqr (Side side, Op op, Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Matrix< scalar_t > &C, Options const &opts)
	Distributed parallel multiply by \(Q\) from QR factorization.

Detailed Description

Factor \(A = QR\), multiply by \(Q\), generate \(Q\).

Function Documentation

template<typename scalar_t >

Distributed parallel Cholesky QR factorization.

Computes a QR factorization of an m-by-n matrix \(A\). The factorization has the form

\[ A = QR, \]

where \(Q\) is a matrix with orthonormal columns and \(R\) is upper triangular (or upper trapezoidal if m < n).

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the m-by-n matrix \(A\). On exit, the Q matrix of the same size as A.
[out]	R	On exit, the R matrix of size n x n where the upper triangular part contains the Cholesky factor.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2. Option::MethodCholQR: Select the algorithm used to computed A^H * A: Auto: GemmA: GemmC: HerkC: Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

template<typename scalar_t >

Distributed parallel QR factorization.

Computes a QR factorization of an m-by-n matrix \(A\). The factorization has the form

\[ A = QR, \]

where \(Q\) is a matrix with orthonormal columns and \(R\) is upper triangular (or upper trapezoidal if m < n).

Complexity (in real):

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the m-by-n matrix \(A\). On exit, the elements on and above the diagonal of the array contain the min(m,n)-by-n upper trapezoidal matrix \(R\) (upper triangular if m >= n); the elements below the diagonal represent the unitary matrix \(Q\) as a product of elementary reflectors.
[out]	T	On exit, triangular matrices of the block reflectors.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

template<typename scalar_t >

Distributed parallel multiply by \(Q\) from QR factorization.

Multiplies the general m-by-n matrix \(C\) by \(Q\) from QR factorization, according to:

op	side = Left	side = Right
op = NoTrans	\(Q C \)	\(C Q \)
op = ConjTrans	\(Q^H C\)	\(C Q^H\)

where \(Q\) is a unitary matrix defined as the product of k elementary reflectors

\[ Q = H(1) H(2) . . . H(k) \]

as returned by geqrf. \(Q\) is of order m if side = Left and of order n if side = Right.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	side	Side::Left: apply \(Q\) or \(Q^H\) from the left; Side::Right: apply \(Q\) or \(Q^H\) from the right.
[in]	op	Op::NoTrans apply \(Q\); Op::ConjTrans: apply \(Q^H\); Op::Trans: apply \(Q^T\) (only if real). In the real case, Op::Trans is equivalent to Op::ConjTrans. In the complex case, Op::Trans is not allowed.
[in]	A	Details of the QR factorization of the original matrix \(A\) as returned by geqrf.
[in]	T	Triangular matrices of the block reflectors as returned by geqrf.
[in,out]	C	On entry, the m-by-n matrix \(C\). On exit, \(C\) is overwritten by \(Q C\), \(Q^H C\), \(C Q\), or \(C Q^H\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.