SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Factor \(A = LQ\), multiply by \(Q\), generate \(Q\). More...

Functions

template<typename scalar_t >
void slate::gelqf (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Options const &opts)
 Distributed parallel LQ factorization.
 
template<typename scalar_t >
void slate::unmlq (Side side, Op op, Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Matrix< scalar_t > &C, Options const &opts)
 Distributed parallel multiply by \(Q\) from LQ factorization.
 

Detailed Description

Factor \(A = LQ\), multiply by \(Q\), generate \(Q\).

Function Documentation

◆ gelqf()

template<typename scalar_t >
void slate::gelqf ( Matrix< scalar_t > &  A,
TriangularFactors< scalar_t > &  T,
Options const &  opts 
)

Distributed parallel LQ factorization.

Computes a LQ factorization of an m-by-n matrix \(A\). The factorization has the form

\[ A = LQ, \]

where \(Q\) is a matrix with orthonormal columns and \(L\) is lower triangular (or lower trapezoidal if m > n).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit, the elements on and below the diagonal of the array contain the m-by-min(m,n) lower trapezoidal matrix \(L\) (lower triangular if m <= n); the elements above the diagonal represent the unitary matrix \(Q\) as a product of elementary reflectors.
[out]TOn exit, triangular matrices of the block reflectors.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ unmlq()

template<typename scalar_t >
void slate::unmlq ( Side  side,
Op  op,
Matrix< scalar_t > &  A,
TriangularFactors< scalar_t > &  T,
Matrix< scalar_t > &  C,
Options const &  opts 
)

Distributed parallel multiply by \(Q\) from LQ factorization.

Multiplies the general m-by-n matrix \(C\) by \(Q\) from LQ factorization, according to:

op side = Left side = Right
op = NoTrans \(Q C \) \(C Q \)
op = ConjTrans \(Q^H C\) \(C Q^H\)

where \(Q\) is a unitary matrix defined as the product of k elementary reflectors

\[ Q = H(1) H(2) . . . H(k) \]

as returned by gelqf. \(Q\) is of order m if side = Left and of order n if side = Right.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]side
  • Side::Left: apply \(Q\) or \(Q^H\) from the left;
  • Side::Right: apply \(Q\) or \(Q^H\) from the right.
[in]op
  • Op::NoTrans apply \(Q\);
  • Op::ConjTrans: apply \(Q^H\);
  • Op::Trans: apply \(Q^T\) (only if real). In the real case, Op::Trans is equivalent to Op::ConjTrans. In the complex case, Op::Trans is not allowed.
[in]ADetails of the LQ factorization of the original matrix \(A\) as returned by gelqf.
[in]TTriangular matrices of the block reflectors as returned by gelqf.
[in,out]COn entry, the m-by-n matrix \(C\). On exit, \(C\) is overwritten by \(Q C\), \(Q^H C\), \(C Q\), or \(C Q^H\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.