SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Solve \(AX = B\). More...

Functions

template<typename scalar_t >
int64_t slate::posv (HermitianMatrix< scalar_t > &A, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel Cholesky factorization and solve.
 
template<typename scalar_hi , typename scalar_lo >
int64_t slate::posv_mixed (HermitianMatrix< scalar_hi > &A, Matrix< scalar_hi > &B, Matrix< scalar_hi > &X, int &iter, Options const &opts)
 Distributed parallel iterative-refinement Cholesky factorization and solve.
 
template<typename scalar_hi , typename scalar_lo >
int64_t slate::posv_mixed_gmres (HermitianMatrix< scalar_hi > &A, Matrix< scalar_hi > &B, Matrix< scalar_hi > &X, int &iter, Options const &opts)
 Distributed parallel GMRES-IR Cholesky factorization and solve.
 

Detailed Description

Solve \(AX = B\).

Function Documentation

◆ posv()

template<typename scalar_t >
int64_t slate::posv ( HermitianMatrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel Cholesky factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n Hermitian positive definite matrix and \(X\) and \(B\) are n-by-nrhs matrices. The Cholesky decomposition is used to factor \(A\) as

\[ A = L L^H, \]

if \(A\) is stored lower, where \(L\) is a lower triangular matrix, or

\[ A = U^H U, \]

if \(A\) is stored upper, where \(U\) is an upper triangular matrix. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).

Complexity (in real): \(\approx \frac{1}{3} n^{3} + 2 n^{2} r\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the n-by-n Hermitian positive definite matrix \(A\). On exit, if return value = 0, overwritten by the factor \(U\) or \(L\) from the Cholesky factorization \(A = U^H U\) or \(A = L L^H\). If scalar_t is real, \(A\) can be a SymmetricMatrix object.
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: the leading minor of order \(i\) of \(A\) is not positive definite, so the factorization could not be completed, and the solution has not been computed.

◆ posv_mixed()

template<typename scalar_hi , typename scalar_lo >
int64_t slate::posv_mixed ( HermitianMatrix< scalar_hi > &  A,
Matrix< scalar_hi > &  B,
Matrix< scalar_hi > &  X,
int &  iter,
Options const &  opts 
)

Distributed parallel iterative-refinement Cholesky factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n Hermitian positive definite matrix and \(X\) and \(B\) are n-by-nrhs matrices.

posv_mixed first factorizes the matrix using potrf in low precision (single) and uses this factorization within an iterative refinement procedure to produce a solution with high precision (double) normwise backward error quality (see below). If the approach fails, the method falls back to a high precision (double) factorization and solve.

The iterative refinement is not going to be a winning strategy if the ratio of low-precision performance over high-precision performance is too small. A reasonable strategy should take the number of right-hand sides and the size of the matrix into account. This might be automated in the future. Up to now, we always try iterative refinement.

The iterative refinement process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:

  • iter is the number of the current iteration in the iterative refinement process
  • \(\norm{r_j}_{inf}\) is the infinity-norm of the residual, \(r_j = Ax_j - b_j\)
  • \(\norm{x_j}_{inf}\) is the infinity-norm of the solution
  • \(\norm{A}_{inf}\) is the infinity-operator-norm of the matrix \(A\)
    Template Parameters
    scalar_hiOne of double, std::complex<double>.
    scalar_loOne of float, std::complex<float>.
    Parameters
    [in,out]AOn entry, the n-by-n Hermitian positive definite matrix \(A\). On exit, if iterative refinement has been successfully used (return value = 0 and iter >= 0, see description below), then \(A\) is unchanged. If high precision (double) factorization has been used (return value = 0 and iter < 0, see description below), then the array \(A\) contains the factor \(U\) or \(L\) from the Cholesky factorization \(A = U^H U\) or \(A = L L^H\). If scalar_t is real, \(A\) can be a SymmetricMatrix.
    [in]BOn entry, the n-by-nrhs right hand side matrix \(B\).
    [out]XOn exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
    [out]iter> 0: The number of the iterations the iterative refinement process needed for convergence. < 0: Iterative refinement failed; it falls back to a double precision factorization and solve. -3: single precision matrix was not positive definite in potrf. -(itermax+1): iterative refinement failed to converge in itermax iterations.
    [in]optsAdditional options, as map of name = value pairs. Possible options:
    • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
    • Option::Target: Implementation to target. Possible values:
      • HostTask: OpenMP tasks on CPU host [default].
      • HostNest: nested OpenMP parallel for loop on CPU host.
      • HostBatch: batched BLAS on CPU host.
      • Devices: batched BLAS on GPU device.
    • Option::Tolerance: Iterative refinement tolerance. Default epsilon * sqrt(m)
    • Option::MaxIterations: Maximum number of refinement iterations. Default 30
    • Option::UseFallbackSolver: If true and iterative refinement fails to converge, the problem is resolved with partial-pivoted LU. Default true
    Returns
    0: successful exit
    i > 0: the leading minor of order \(i\) of \(A\) is not positive definite, so the factorization could not be completed, and the solution has not been computed.

◆ posv_mixed_gmres()

template<typename scalar_hi , typename scalar_lo >
int64_t slate::posv_mixed_gmres ( HermitianMatrix< scalar_hi > &  A,
Matrix< scalar_hi > &  B,
Matrix< scalar_hi > &  X,
int &  iter,
Options const &  opts 
)

Distributed parallel GMRES-IR Cholesky factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n Hermitian positive definite matrix and \(X\) and \(B\) are n-by-nrhs matrices.

posv_mixed_gmres first factorizes the matrix using potrf in low precision (single) and uses this factorization within a GMRES-IR procedure to produce a solution with high precision (double) normwise backward error quality (see below). If the approach fails, the method falls back to a high precision (double) factorization and solve.

GMRES-IR is not going to be a winning strategy if the ratio of low-precision performance over high-precision performance is too small. A reasonable strategy should take the number of right-hand sides and the the size of the matrix into account. This might be automated in the future. Up to now, we always try iterative refinement.

GMRES-IR process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:

  • iter is the number of the current iteration in the iterative refinement process
  • \(\norm{r_j}_{inf}\) is the infinity-norm of the residual, \(r_j = Ax_j - b_j\)
  • \(\norm{x_j}_{inf}\) is the infinity-norm of the solution
  • \(\norm{A}_{inf}\) is the infinity-operator-norm of the matrix \(A\)
    Template Parameters
    scalar_hiOne of double, std::complex<double>.
    scalar_loOne of float, std::complex<float>.
    Parameters
    [in,out]AOn entry, the n-by-n Hermitian positive definite matrix \(A\). On exit, if iterative refinement has been successfully used (return value = 0 and iter >= 0, see description below), then \(A\) is unchanged. If high precision (double) factorization has been used (return value = 0 and iter < 0, see description below), then the array \(A\) contains the factors \(L\) or \(U\) from the Cholesky factorization \(A = U^H U\) or \(A = L L^H\).
    [in]BOn entry, the n-by-nrhs right hand side matrix \(B\).
    [out]XOn exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
    [out]iter
    [out]iter> 0: The number of the iterations the iterative refinement process needed for convergence. < 0: Iterative refinement failed; it falls back to a double precision factorization and solve. -3: single precision matrix was exactly singular in getrf. -(itermax+1): iterative refinement failed to converge in itermax iterations.
    [in]optsAdditional options, as map of name = value pairs. Possible options:
    • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
    • Option::MaxIterations The iteration limit for refinement. Default 30.
    • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
    • Option::Target: Implementation to target. Possible values:
      • HostTask: OpenMP tasks on CPU host [default].
      • HostNest: nested OpenMP parallel for loop on CPU host.
      • HostBatch: batched BLAS on CPU host.
      • Devices: batched BLAS on GPU device.
    • Option::Tolerance: Iterative refinement tolerance. Default epsilon * sqrt(m)
    • Option::MaxIterations: Maximum number of refinement iterations. Default 30
    • Option::UseFallbackSolver: If true and iterative refinement fails to converge, the problem is resolved with partial-pivoted LU. Default true
    Returns
    0: successful exit
    i > 0: the leading minor of order \(i\) of \(A\) is not positive definite, so the factorization could not be completed, and the solution has not been computed.