SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Factor, forward and back solve, invert. More...

Functions

template<typename scalar_t >
void slate::gerbt (Matrix< scalar_t > &U_in, Matrix< scalar_t > &A, Matrix< scalar_t > &V)
 Applies a 2-sided RBT to the given matrix.
 
template<typename scalar_t >
void slate::gerbt (Matrix< scalar_t > &Uin, Matrix< scalar_t > &B)
 Applies a 1-sided RBT to the given matrix on the left.
 
template<typename scalar_t >
int64_t slate::getrf (Matrix< scalar_t > &A, Pivots &pivots, Options const &opts)
 Distributed parallel LU factorization.
 
template<typename scalar_t >
int64_t slate::getrf_nopiv (Matrix< scalar_t > &A, Options const &opts)
 Distributed parallel LU factorization without pivoting.
 
template<typename scalar_t >
int64_t slate::getrf_tntpiv (Matrix< scalar_t > &A, Pivots &pivots, Options const &opts)
 Distributed parallel LU factorization.
 
template<typename scalar_t >
void slate::getri (Matrix< scalar_t > &A, Pivots &pivots, Options const &opts)
 Distributed parallel LU inversion.
 
template<typename scalar_t >
void slate::getri (Matrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel LU inversion (out-of-place version).
 
template<typename scalar_t >
void slate::getrs (Matrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel LU solve.
 
template<typename scalar_t >
void slate::getrs_nopiv (Matrix< scalar_t > &A, Matrix< scalar_t > &B, Options const &opts=Options())
 

Detailed Description

Factor, forward and back solve, invert.

Function Documentation

◆ gerbt() [1/2]

template<typename scalar_t >
void slate::gerbt ( Matrix< scalar_t > &  U_in,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  V 
)

Applies a 2-sided RBT to the given matrix.

Parameters
[in]U_inThe left transform in packed storage. Should be transposed.
[in,out]AThe matrix to transform
[in]VThe right transform in packed storage. Should not be transposed.

◆ gerbt() [2/2]

template<typename scalar_t >
void slate::gerbt ( Matrix< scalar_t > &  Uin,
Matrix< scalar_t > &  B 
)

Applies a 1-sided RBT to the given matrix on the left.

Parameters
[in]U_inThe transform in packed storage
[in,out]AThe matrix to transform

◆ getrf()

template<typename scalar_t >
int64_t slate::getrf ( Matrix< scalar_t > &  A,
Pivots &  pivots,
Options const &  opts 
)

Distributed parallel LU factorization.

Computes an LU factorization of a general m-by-n matrix \(A\) using partial pivoting with row interchanges.

The factorization has the form

\[ A = P L U \]

where \(P\) is a permutation matrix, \(L\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U\) is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

Complexity (in real):

  • \(\approx m n^{2} - \frac{1}{3} n^{3}\) flops;
  • \(\approx \frac{2}{3} n^{3}\) flops for \(m = n\).
Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[out]pivotsThe pivot indices that define the permutation matrix \(P\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
  • Option::PivotThreshold: Strictness of the pivot selection. Between 0 and 1 with 1 giving partial pivoting and 0 giving no pivoting. Default 1.
  • Option::MethodLU: Algorithm for LU factorization.
Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, and division by zero will occur if it is used to solve a system of equations.

◆ getrf_nopiv()

template<typename scalar_t >
int64_t slate::getrf_nopiv ( Matrix< scalar_t > &  A,
Options const &  opts 
)

Distributed parallel LU factorization without pivoting.

Computes an LU factorization without pivoting of a general m-by-n matrix \(A\)

The factorization has the form

\[ A = L U \]

where \(L\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U\) is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization will have NaN due to division by zero.

◆ getrf_tntpiv()

template<typename scalar_t >
int64_t slate::getrf_tntpiv ( Matrix< scalar_t > &  A,
Pivots &  pivots,
Options const &  opts 
)

Distributed parallel LU factorization.

Computes an LU factorization of a general m-by-n matrix \(A\) using partial pivoting with row interchanges.

The factorization has the form

\[ A = P L U \]

where \(P\) is a permutation matrix, \(L\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U\) is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[out]pivotsThe pivot indices that define the permutation matrix \(P\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, and division by zero will occur if it is used to solve a system of equations.

◆ getri() [1/2]

template<typename scalar_t >
void slate::getri ( Matrix< scalar_t > &  A,
Pivots &  pivots,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel LU inversion (out-of-place version).

Computes the inverse of a matrix \(A\) using the LU factorization \(A = L*U\) computed by getrf. Stores the result in \(B\). Does not change \(A\).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]AOn entry, the factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf. On exit, the inverse of the original matrix \(A\).
[in]pivotsThe pivot indices that define the permutation matrix \(P\) as computed by getrf.
[out]BOn exit, if return value = 0, the n-by-n inverse of matrix \(A\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ getri() [2/2]

template<typename scalar_t >
void slate::getri ( Matrix< scalar_t > &  A,
Pivots &  pivots,
Options const &  opts 
)

Distributed parallel LU inversion.

Computes the inverse of a matrix \(A\) using the LU factorization \(A = L*U\) computed by getrf.

Complexity (in real): \(\approx \frac{4}{3} n^{3}\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]AOn entry, the factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf. On exit, the inverse of the original matrix \(A\).
[in]pivotsThe pivot indices that define the permutation matrix \(P\) as computed by getrf.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ getrs()

template<typename scalar_t >
void slate::getrs ( Matrix< scalar_t > &  A,
Pivots &  pivots,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel LU solve.

Solves a system of linear equations

\[ A X = B \]

with a general n-by-n matrix \(A\) using the LU factorization computed by getrf. \(A\) can be transposed or conjugate-transposed.

Complexity (in real): \(\approx 2 n^{2} r\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]AThe factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf.
[in]pivotsThe pivot indices that define the permutation matrix \(P\) as computed by gbtrf.
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

Option::MethodLU: Algorithm for LU factorization.

◆ getrs_nopiv()

template<typename scalar_t >
void slate::getrs_nopiv ( Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
Options const &  opts 
)
Deprecated:
Use getrs( A, pivots, B, { Option::MethodLU, MethodLU::NoPiv } ).

Distributed parallel LU solve.

Solves a system of linear equations

\[ A X = B \]

with a general n-by-n matrix \(A\) using the LU factorization computed by getrf. \(A\) can be transposed or conjugate-transposed.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]AThe factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf.
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.