SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Solve \(AX = B\). More...

Functions

template<typename scalar_t >
int64_t slate::hesv (HermitianMatrix< scalar_t > &A, Pivots &pivots, BandMatrix< scalar_t > &T, Pivots &pivots2, Matrix< scalar_t > &H, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel Hermitian indefinite \(LTL^T\) factorization and solve.
 

Detailed Description

Solve \(AX = B\).

Function Documentation

◆ hesv()

template<typename scalar_t >
int64_t slate::hesv ( HermitianMatrix< scalar_t > &  A,
Pivots &  pivots,
BandMatrix< scalar_t > &  T,
Pivots &  pivots2,
Matrix< scalar_t > &  H,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel Hermitian indefinite \(LTL^T\) factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n Hermitian matrix and \(X\) and \(B\) are n-by-nrhs matrices.

Aasen's 2-stage algorithm is used to factor \(A\) as

\[ A = L T L^H, \]

if \(A\) is stored lower, or

\[ A = U^H T U, \]

if \(A\) is stored upper. \(U\) (or \(L\)) is a product of permutation and unit upper (lower) triangular matrices, and \(T\) is Hermitian and banded. The matrix \(T\) is then LU-factored with partial pivoting. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).

This is the blocked version of the algorithm, calling Level 3 BLAS.

Complexity (in real): \(\approx \frac{1}{3} n^{3} + 2 n^{2} r\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the n-by-n Hermitian matrix \(A\). On exit, if return value = 0, overwritten by the factor \(U\) or \(L\) from the factorization \(A = U^H T U\) or \(A = L T L^H\). If scalar_t is real, \(A\) can be a SymmetricMatrix object.
[out]pivotsOn exit, details of the interchanges applied to \(A\), i.e., row and column k of \(A\) were swapped with row and column pivots(k).
[out]TOn exit, details of the LU factorization of the band matrix.
[out]pivots2On exit, details of the interchanges applied to \(T\), i.e., row and column k of \(T\) were swapped with row and column pivots2(k).
[out]HAuxiliary matrix used during the factorization. TODO: can this be made internal?
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: the band LU factorization failed on the \(i\)-th column.