Solve \(AX = B\). More...

Functions
template<typename scalar_t >
int64_t	slate::hesv (HermitianMatrix< scalar_t > &A, Pivots &pivots, BandMatrix< scalar_t > &T, Pivots &pivots2, Matrix< scalar_t > &H, Matrix< scalar_t > &B, Options const &opts)
	Distributed parallel Hermitian indefinite \(LTL^T\) factorization and solve.

Detailed Description

Solve \(AX = B\).

Function Documentation

◆ hesv()

template<typename scalar_t >

int64_t slate::hesv	(	HermitianMatrix< scalar_t > &	A,
		Pivots &	pivots,
		BandMatrix< scalar_t > &	T,
		Pivots &	pivots2,
		Matrix< scalar_t > &	H,
		Matrix< scalar_t > &	B,
		Options const &	opts
	)

Distributed parallel Hermitian indefinite \(LTL^T\) factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n Hermitian matrix and \(X\) and \(B\) are n-by-nrhs matrices.

Aasen's 2-stage algorithm is used to factor \(A\) as

\[ A = L T L^H, \]

if \(A\) is stored lower, or

\[ A = U^H T U, \]

if \(A\) is stored upper. \(U\) (or \(L\)) is a product of permutation and unit upper (lower) triangular matrices, and \(T\) is Hermitian and banded. The matrix \(T\) is then LU-factored with partial pivoting. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).

This is the blocked version of the algorithm, calling Level 3 BLAS.

Complexity (in real): \(\approx \frac{1}{3} n^{3} + 2 n^{2} r\) flops.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the n-by-n Hermitian matrix \(A\). On exit, if return value = 0, overwritten by the factor \(U\) or \(L\) from the factorization \(A = U^H T U\) or \(A = L T L^H\). If scalar_t is real, \(A\) can be a SymmetricMatrix object.
[out]	pivots	On exit, details of the interchanges applied to \(A\), i.e., row and column k of \(A\) were swapped with row and column pivots(k).
[out]	T	On exit, details of the LU factorization of the band matrix.
[out]	pivots2	On exit, details of the interchanges applied to \(T\), i.e., row and column k of \(T\) were swapped with row and column pivots2(k).
[out]	H	Auxiliary matrix used during the factorization. TODO: can this be made internal?
[in,out]	B	On entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

Returns: 0: successful exit; i > 0: the band LU factorization failed on the \(i\)-th column.

Functions

Detailed Description

Function Documentation

◆ hesv()