SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends Modules Pages

Factor \(A = QR\), multiply by \(Q\), generate \(Q\). More...

Functions

template<typename scalar_t >
void slate::cholqr (Matrix< scalar_t > &A, Matrix< scalar_t > &R, Options const &opts)
 Distributed parallel Cholesky QR factorization.
 
template<typename scalar_t >
void slate::geqrf (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Options const &opts)
 Distributed parallel QR factorization.
 
template<typename scalar_t >
void slate::unmqr (Side side, Op op, Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Matrix< scalar_t > &C, Options const &opts)
 Distributed parallel multiply by \(Q\) from QR factorization.
 

Detailed Description

Factor \(A = QR\), multiply by \(Q\), generate \(Q\).

Function Documentation

◆ cholqr()

template<typename scalar_t >
void slate::cholqr ( Matrix< scalar_t > &  A,
Matrix< scalar_t > &  R,
Options const &  opts 
)

Distributed parallel Cholesky QR factorization.

Computes a QR factorization of an m-by-n matrix \(A\). The factorization has the form

\[ A = QR, \]

where \(Q\) is a matrix with orthonormal columns and \(R\) is upper triangular (or upper trapezoidal if m < n).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit, the Q matrix of the same size as A.
[out]ROn exit, the R matrix of size n x n where the upper triangular part contains the Cholesky factor.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::MethodCholQR: Select the algorithm used to computed A^H * A:
    • Auto:
    • GemmA:
    • GemmC:
    • HerkC:
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ geqrf()

template<typename scalar_t >
void slate::geqrf ( Matrix< scalar_t > &  A,
TriangularFactors< scalar_t > &  T,
Options const &  opts 
)

Distributed parallel QR factorization.

Computes a QR factorization of an m-by-n matrix \(A\). The factorization has the form

\[ A = QR, \]

where \(Q\) is a matrix with orthonormal columns and \(R\) is upper triangular (or upper trapezoidal if m < n).

Complexity (in real):

  • for \(m \ge n\), \(\approx 2 m n^{2} - \frac{2}{3} n^{3}\) flops;
  • for \(m \le n\), \(\approx 2 m^{2} n - \frac{2}{3} m^{3}\) flops;
  • for \(m = n\), \(\approx \frac{4}{3} n^{3}\) flops.
Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit, the elements on and above the diagonal of the array contain the min(m,n)-by-n upper trapezoidal matrix \(R\) (upper triangular if m >= n); the elements below the diagonal represent the unitary matrix \(Q\) as a product of elementary reflectors.
[out]TOn exit, triangular matrices of the block reflectors.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ unmqr()

template<typename scalar_t >
void slate::unmqr ( Side  side,
Op  op,
Matrix< scalar_t > &  A,
TriangularFactors< scalar_t > &  T,
Matrix< scalar_t > &  C,
Options const &  opts 
)

Distributed parallel multiply by \(Q\) from QR factorization.

Multiplies the general m-by-n matrix \(C\) by \(Q\) from QR factorization, according to:

op side = Left side = Right
op = NoTrans \(Q C \) \(C Q \)
op = ConjTrans \(Q^H C\) \(C Q^H\)

where \(Q\) is a unitary matrix defined as the product of k elementary reflectors

\[ Q = H(1) H(2) . . . H(k) \]

as returned by geqrf. \(Q\) is of order m if side = Left and of order n if side = Right.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]side
  • Side::Left: apply \(Q\) or \(Q^H\) from the left;
  • Side::Right: apply \(Q\) or \(Q^H\) from the right.
[in]op
  • Op::NoTrans apply \(Q\);
  • Op::ConjTrans: apply \(Q^H\);
  • Op::Trans: apply \(Q^T\) (only if real). In the real case, Op::Trans is equivalent to Op::ConjTrans. In the complex case, Op::Trans is not allowed.
[in]ADetails of the QR factorization of the original matrix \(A\) as returned by geqrf.
[in]TTriangular matrices of the block reflectors as returned by geqrf.
[in,out]COn entry, the m-by-n matrix \(C\). On exit, \(C\) is overwritten by \(Q C\), \(Q^H C\), \(C Q\), or \(C Q^H\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.