SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Factor, forward and back solve. More...

Functions

template<typename scalar_t >
int64_t slate::hetrf (HermitianMatrix< scalar_t > &A, Pivots &pivots, BandMatrix< scalar_t > &T, Pivots &pivots2, Matrix< scalar_t > &H, Options const &opts)
 Distributed parallel Hermitian indefinite \(LTL^T\) factorization.
 
template<typename scalar_t >
void slate::hetrs (HermitianMatrix< scalar_t > &A, Pivots &pivots, BandMatrix< scalar_t > &T, Pivots &pivots2, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel Hermitian indefinite \(LTL^T\) solve.
 

Detailed Description

Factor, forward and back solve.

Function Documentation

◆ hetrf()

template<typename scalar_t >
int64_t slate::hetrf ( HermitianMatrix< scalar_t > &  A,
Pivots &  pivots,
BandMatrix< scalar_t > &  T,
Pivots &  pivots2,
Matrix< scalar_t > &  H,
Options const &  opts 
)

Distributed parallel Hermitian indefinite \(LTL^T\) factorization.

Computes the factorization of a Hermitian matrix \(A\) using Aasen's 2-stage algorithm. The form of the factorization is

\[ A = L T L^H, \]

if \(A\) is stored lower, where \(L\) is a product of permutation and unit lower triangular matrices, or

\[ P A P^H = U^H T U, \]

if \(A\) is stored upper, where \(U\) is a product of permutation and unit upper triangular matrices. \(T\) is a Hermitian band matrix that is LU factorized with partial pivoting.

Complexity (in real): \(\approx \frac{1}{3} n^{3}\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the n-by-n Hermitian matrix \(A\). On exit, if return value = 0, overwritten by the factor \(U\) or \(L\) from the factorization \(A = U^H T U\) or \(A = L T L^H\). If scalar_t is real, \(A\) can be a SymmetricMatrix object.
[out]pivotsOn exit, details of the interchanges applied to \(A\), i.e., row and column k of \(A\) were swapped with row and column pivots(k).
[out]TOn exit, details of the LU factorization of the band matrix.
[out]pivots2On exit, details of the interchanges applied to \(T\), i.e., row and column k of \(T\) were swapped with row and column pivots2(k).
[out]HAuxiliary matrix used during the factorization. TODO: can this be made internal?
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: the band LU factorization failed on the \(i\)-th column.

◆ hetrs()

template<typename scalar_t >
void slate::hetrs ( HermitianMatrix< scalar_t > &  A,
Pivots &  pivots,
BandMatrix< scalar_t > &  T,
Pivots &  pivots2,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel Hermitian indefinite \(LTL^T\) solve.

Solves a system of linear equations \(A X = B\) with a Hermitian matrix \(A\) using the factorization \(A = U^H T U\) or \(A = L T L^H\) computed by hetrf.

Complexity (in real): \(2 n^{2} r\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]ADetails of the factors \(U\) or \(L\) as computed by hetrf. If scalar_t is real, \(A\) can be a SymmetricMatrix object.
[out]pivotsDetails of the interchanges applied to \(A\) as computed by hetrf.
[out]TDetails of the LU factorization of the band matrix as computed by hetrf.
[out]pivots2Details of the interchanges applied to \(T\) as computed by hetrf.
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.