SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends Modules Pages
Linear least squares

Solve \(AX \cong B\), over-determined (tall \(A\)) or under-determined (wide \(A\)) More...

Functions

template<typename scalar_t >
void slate::gels (Matrix< scalar_t > &A, Matrix< scalar_t > &BX, Options const &opts)
 Distributed parallel least squares solve via QR or LQ factorization.
 
template<typename scalar_t >
void slate::gels_cholqr (Matrix< scalar_t > &A, Matrix< scalar_t > &R, Matrix< scalar_t > &BX, Options const &opts)
 Distributed parallel least squares solve via CholeskyQR factorization.
 
template<typename scalar_t >
void slate::gels_qr (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &T, Matrix< scalar_t > &BX, Options const &opts)
 Distributed parallel least squares solve via QR or LQ factorization.
 

Detailed Description

Solve \(AX \cong B\), over-determined (tall \(A\)) or under-determined (wide \(A\))

Function Documentation

◆ gels()

template<typename scalar_t >
void slate::gels ( Matrix< scalar_t > &  A,
Matrix< scalar_t > &  BX,
Options const &  opts 
)

Distributed parallel least squares solve via QR or LQ factorization.

Solves overdetermined or underdetermined complex linear systems involving an m-by-n matrix \(A\), using a QR or LQ factorization of \(A\). It is assumed that \(A\) has full rank. \(X\) is n-by-nrhs, \(B\) is m-by-nrhs, \(BX\) is max(m, n)-by-nrhs.

If m >= n, solves over-determined \(A X = B\) with least squares solution \(X\) that minimizes \(\norm{ A X - B }_2\). \(BX\) is m-by-nrhs. On input, \(B\) is all m rows of \(BX\). On output, \(X\) is first n rows of \(BX\). Currently in this case \(A\) must be not transposed.

If m < n, solves under-determined \(A X = B\) with minimum norm solution \(X\) that minimizes \(\norm{ X }_2\). \(BX\) is n-by-nrhs. On input, \(B\) is first m rows of \(BX\). On output, \(X\) is all n rows of \(BX\). Currently in this case \(A\) must be transposed (only if real) or conjugate-transposed.

Several right hand side vectors \(b\) and solution vectors \(x\) can be handled in a single call; they are stored as the columns of the m-by-nrhs right hand side matrix \(B\) and the n-by-nrhs solution matrix \(X\).

Note these (m, n) differ from (M, N) in (Sca)LAPACK, where the original \(A\) is M-by-N, before applying any transpose, while here \(A\) is m-by-n, after applying any transpose,.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit:
If (m >= n and \(A\) is not transposed) or (m < n and \(A\) is (conjugate) transposed), \(A\) is overwritten by details of its QR factorization as returned by geqrf.
If (m >= n and \(A\) is (conjugate) transposed) or (m < n and \(A\) is not transposed), \(A\) is overwritten by details of its LQ factorization as returned by gelqf (todo: not currently supported).
[in,out]BXMatrix of size max(m,n)-by-nrhs. On entry, the m-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

TODO: return value

Return values
0successful exit
>0for return value = \(i\), in the computed triangular factor, \(R(i,i)\) is exactly zero, so that \(A\) does not have full rank; the least squares solution could not be computed.

◆ gels_cholqr()

template<typename scalar_t >
void slate::gels_cholqr ( Matrix< scalar_t > &  A,
Matrix< scalar_t > &  R,
Matrix< scalar_t > &  BX,
Options const &  opts 
)

Distributed parallel least squares solve via CholeskyQR factorization.

Solves overdetermined or underdetermined complex linear systems involving an m-by-n matrix \(A\), using a CholeskyQR factorization of \(A\). It is assumed that \(A\) has full rank. \(X\) is n-by-nrhs, \(B\) is m-by-nrhs, \(BX\) is max(m, n)-by-nrhs.

If m >= n, solves over-determined \(A X = B\) with least squares solution \(X\) that minimizes \(\norm{ A X - B }_2\). \(BX\) is m-by-nrhs. On input, \(B\) is all m rows of \(BX\). On output, \(X\) is first n rows of \(BX\). Currently in this case \(A\) must be not transposed.

If m < n, solves under-determined \(A X = B\) with minimum norm solution \(X\) that minimizes \(\norm{ X }_2\). \(BX\) is n-by-nrhs. On input, \(B\) is first m rows of \(BX\). On output, \(X\) is all n rows of \(BX\). Currently in this case \(A\) must be transposed (only if real) or conjugate-transposed.

Several right hand side vectors \(b\) and solution vectors \(x\) can be handled in a single call; they are stored as the columns of the m-by-nrhs right hand side matrix \(B\) and the n-by-nrhs solution matrix \(X\).

Note these (m, n) differ from (M, N) in (Sca)LAPACK, where the original \(A\) is M-by-N, before applying any transpose, while here \(A\) is m-by-n, after applying any transpose,.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit:
If (m >= n and \(A\) is not transposed) or (m < n and \(A\) is (conjugate) transposed), \(A\) is overwritten by details of its QR factorization as returned by geqrf.
If (m >= n and \(A\) is (conjugate) transposed) or (m < n and \(A\) is not transposed), \(A\) is overwritten by details of its LQ factorization as returned by gelqf (todo: not currently supported).
[out]RThe triangular matrix from the CholeskyQR factorization, as returned by cholqr.
[in,out]BXMatrix of size max(m,n)-by-nrhs. On entry, the m-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

TODO: return value

Return values
0successful exit
>0for return value = \(i\), in the computed triangular factor, \(R(i,i)\) is exactly zero, so that \(A\) does not have full rank; the least squares solution could not be computed.

◆ gels_qr()

template<typename scalar_t >
void slate::gels_qr ( Matrix< scalar_t > &  A,
TriangularFactors< scalar_t > &  T,
Matrix< scalar_t > &  BX,
Options const &  opts 
)

Distributed parallel least squares solve via QR or LQ factorization.

Solves overdetermined or underdetermined complex linear systems involving an m-by-n matrix \(A\), using a QR or LQ factorization of \(A\). It is assumed that \(A\) has full rank. \(X\) is n-by-nrhs, \(B\) is m-by-nrhs, \(BX\) is max(m, n)-by-nrhs.

If m >= n, solves over-determined \(A X = B\) with least squares solution \(X\) that minimizes \(\norm{ A X - B }_2\). \(BX\) is m-by-nrhs. On input, \(B\) is all m rows of \(BX\). On output, \(X\) is first n rows of \(BX\). Currently in this case \(A\) must be not transposed.

If m < n, solves under-determined \(A X = B\) with minimum norm solution \(X\) that minimizes \(\norm{ X }_2\). \(BX\) is n-by-nrhs. On input, \(B\) is first m rows of \(BX\). On output, \(X\) is all n rows of \(BX\). Currently in this case \(A\) must be transposed (only if real) or conjugate-transposed.

Several right hand side vectors \(b\) and solution vectors \(x\) can be handled in a single call; they are stored as the columns of the m-by-nrhs right hand side matrix \(B\) and the n-by-nrhs solution matrix \(X\).

Note these (m, n) differ from (M, N) in (Sca)LAPACK, where the original \(A\) is M-by-N, before applying any transpose, while here \(A\) is m-by-n, after applying any transpose,.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit:
If (m >= n and \(A\) is not transposed) or (m < n and \(A\) is (conjugate) transposed), \(A\) is overwritten by details of its QR factorization as returned by geqrf.
If (m >= n and \(A\) is (conjugate) transposed) or (m < n and \(A\) is not transposed), \(A\) is overwritten by details of its LQ factorization as returned by gelqf (todo: not currently supported).
[out]TThe triangular matrices of the block reflectors from the QR or LQ factorization, as returned by geqrf or gelqf.
[in,out]BXMatrix of size max(m,n)-by-nrhs. On entry, the m-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

TODO: return value

Return values
0successful exit
>0for return value = \(i\), in the computed triangular factor, \(R(i,i)\) is exactly zero, so that \(A\) does not have full rank; the least squares solution could not be computed.