Factor \(A = LQ\), multiply by \(Q\), generate \(Q\). More...

Functions
template<typename scalar_t >
void	slate::gelqf (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Options const &opts)
	Distributed parallel LQ factorization.

template<typename scalar_t >
void	slate::unmlq (Side side, Op op, Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Matrix< scalar_t > &C, Options const &opts)
	Distributed parallel multiply by \(Q\) from LQ factorization.

Detailed Description

Factor \(A = LQ\), multiply by \(Q\), generate \(Q\).

Function Documentation

template<typename scalar_t >

Distributed parallel LQ factorization.

Computes a LQ factorization of an m-by-n matrix \(A\). The factorization has the form

\[ A = LQ, \]

where \(Q\) is a matrix with orthonormal columns and \(L\) is lower triangular (or lower trapezoidal if m > n).

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the m-by-n matrix \(A\). On exit, the elements on and below the diagonal of the array contain the m-by-min(m,n) lower trapezoidal matrix \(L\) (lower triangular if m <= n); the elements above the diagonal represent the unitary matrix \(Q\) as a product of elementary reflectors.
[out]	T	On exit, triangular matrices of the block reflectors.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

template<typename scalar_t >

Distributed parallel multiply by \(Q\) from LQ factorization.

Multiplies the general m-by-n matrix \(C\) by \(Q\) from LQ factorization, according to:

op	side = Left	side = Right
op = NoTrans	\(Q C \)	\(C Q \)
op = ConjTrans	\(Q^H C\)	\(C Q^H\)

where \(Q\) is a unitary matrix defined as the product of k elementary reflectors

\[ Q = H(1) H(2) . . . H(k) \]

as returned by gelqf. \(Q\) is of order m if side = Left and of order n if side = Right.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	side	Side::Left: apply \(Q\) or \(Q^H\) from the left; Side::Right: apply \(Q\) or \(Q^H\) from the right.
[in]	op	Op::NoTrans apply \(Q\); Op::ConjTrans: apply \(Q^H\); Op::Trans: apply \(Q^T\) (only if real). In the real case, Op::Trans is equivalent to Op::ConjTrans. In the complex case, Op::Trans is not allowed.
[in]	A	Details of the LQ factorization of the original matrix \(A\) as returned by gelqf.
[in]	T	Triangular matrices of the block reflectors as returned by gelqf.
[in,out]	C	On entry, the m-by-n matrix \(C\). On exit, \(C\) is overwritten by \(Q C\), \(Q^H C\), \(C Q\), or \(C Q^H\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.