SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Solve \(AX = B\). More...

Functions

template<typename scalar_t >
int64_t slate::gesv (Matrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel LU factorization and solve.
 
template<typename scalar_hi , typename scalar_lo >
int64_t slate::gesv_mixed (Matrix< scalar_hi > &A, Pivots &pivots, Matrix< scalar_hi > &B, Matrix< scalar_hi > &X, int &iter, Options const &opts)
 Distributed parallel iterative-refinement LU factorization and solve.
 
template<typename scalar_hi , typename scalar_lo >
int64_t slate::gesv_mixed_gmres (Matrix< scalar_hi > &A, Pivots &pivots, Matrix< scalar_hi > &B, Matrix< scalar_hi > &X, int &iter, Options const &opts)
 Distributed parallel GMRES-IR LU factorization and solve.
 
template<typename scalar_t >
int64_t slate::gesv_nopiv (Matrix< scalar_t > &A, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel LU factorization and solve.
 
template<typename scalar_t >
void slate::gesv_rbt (Matrix< scalar_t > &A, Matrix< scalar_t > &B, Matrix< scalar_t > &X, int &iter, Options const &opts)
 Distributed parallel LU factorization and solve.
 

Detailed Description

Solve \(AX = B\).

Function Documentation

◆ gesv()

template<typename scalar_t >
int64_t slate::gesv ( Matrix< scalar_t > &  A,
Pivots &  pivots,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel LU factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor \(A\) as

\[ A = P L U, \]

where \(P\) is a permutation matrix, \(L\) is unit lower triangular, and \(U\) is upper triangular. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).

Complexity (in real): \(\approx \frac{2}{3} n^{3} + 2 n^{2} r\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the n-by-n matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[out]pivotsThe pivot indices that define the permutation matrix \(P\).
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

Option::MethodLU: Algorithm for LU factorization.

  • PPLU: partial pivoting [default].
  • CALU: communication avoiding.
  • NoPiv: no pivoting. Note pivots vector is currently ignored for NoPiv.
Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, so the solution could not be computed.

◆ gesv_mixed()

template<typename scalar_hi , typename scalar_lo >
int64_t slate::gesv_mixed ( Matrix< scalar_hi > &  A,
Pivots &  pivots,
Matrix< scalar_hi > &  B,
Matrix< scalar_hi > &  X,
int &  iter,
Options const &  opts 
)

Distributed parallel iterative-refinement LU factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.

gesv_mixed first factorizes the matrix using getrf in low precision (single) and uses this factorization within an iterative refinement procedure to produce a solution with high precision (double) normwise backward error quality (see below). If the approach fails, the method falls back to a high precision (double) factorization and solve.

The iterative refinement is not going to be a winning strategy if the ratio of low-precision performance over high-precision performance is too small. A reasonable strategy should take the number of right-hand sides and the size of the matrix into account. This might be automated in the future. Up to now, we always try iterative refinement.

The iterative refinement process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:

  • iter is the number of the current iteration in the iterative refinement process
  • \(\norm{r_j}_{inf}\) is the infinity-norm of the residual, \(r_j = Ax_j - b_j\)
  • \(\norm{x_j}_{inf}\) is the infinity-norm of the solution
  • \(\norm{A}_{inf}\) is the infinity-operator-norm of the matrix \(A\)
    Template Parameters
    scalar_hiOne of double, std::complex<double>.
    scalar_loOne of float, std::complex<float>.
    Parameters
    [in,out]AOn entry, the n-by-n matrix \(A\). On exit, if iterative refinement has been successfully used (return value = 0 and iter >= 0, see description below), then \(A\) is unchanged. If high precision (double) factorization has been used (return value = 0 and iter < 0, see description below), then the array \(A\) contains the factors \(L\) and \(U\) from the factorization \(A = P L U\).
    [out]pivotsThe pivot indices that define the permutation matrix \(P\).
    [in]BOn entry, the n-by-nrhs right hand side matrix \(B\).
    [out]XOn exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
    [out]iter> 0: The number of the iterations the iterative refinement process needed for convergence. < 0: Iterative refinement failed; it falls back to a double precision factorization and solve. -3: single precision matrix was exactly singular in getrf. -(itermax+1): iterative refinement failed to converge in itermax iterations.
    [in]optsAdditional options, as map of name = value pairs. Possible options:
    • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
    • Option::Target: Implementation to target. Possible values:
      • HostTask: OpenMP tasks on CPU host [default].
      • HostNest: nested OpenMP parallel for loop on CPU host.
      • HostBatch: batched BLAS on CPU host.
      • Devices: batched BLAS on GPU device.
    • Option::Tolerance: Iterative refinement tolerance. Default epsilon * sqrt(m)
    • Option::MaxIterations: Maximum number of refinement iterations. Default 30
    • Option::UseFallbackSolver: If true and iterative refinement fails to converge, the problem is resolved with partial-pivoted LU. Default true
    Returns
    0: successful exit
    i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, so the solution could not be computed.

◆ gesv_mixed_gmres()

template<typename scalar_hi , typename scalar_lo >
int64_t slate::gesv_mixed_gmres ( Matrix< scalar_hi > &  A,
Pivots &  pivots,
Matrix< scalar_hi > &  B,
Matrix< scalar_hi > &  X,
int &  iter,
Options const &  opts 
)

Distributed parallel GMRES-IR LU factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.

gesv_mixed_gmres first factorizes the matrix using getrf in low precision (single) and uses this factorization within a GMRES-IR procedure to produce a solution with high precision (double) normwise backward error quality (see below). If the approach fails, the method falls back to a high precision (double) factorization and solve.

GMRES-IR is not going to be a winning strategy if the ratio of low-precision performance over high-precision performance is too small. A reasonable strategy should take the number of right-hand sides and the the size of the matrix into account. This might be automated in the future. Up to now, we always try iterative refinement.

GMRES-IR process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:

  • iter is the number of the current iteration in the iterative refinement process
  • \(\norm{r_j}_{inf}\) is the infinity-norm of the residual, \(r_j = Ax_j - b_j\)
  • \(\norm{x_j}_{inf}\) is the infinity-norm of the solution
  • \(\norm{A}_{inf}\) is the infinity-operator-norm of the matrix \(A\)
    Template Parameters
    scalar_hiOne of double, std::complex<double>.
    scalar_loOne of float, std::complex<float>.
    Parameters
    [in,out]AOn entry, the n-by-n matrix \(A\). On exit, if iterative refinement has been successfully used (return value = 0 and iter >= 0, see description below), then \(A\) is unchanged. If high precision (double) factorization has been used (return value = 0 and iter < 0, see description below), then the array \(A\) contains the factors \(L\) and \(U\) from the factorization \(A = P L U\).
    [out]pivotsThe pivot indices that define the permutation matrix \(P\).
    [in]BOn entry, the n-by-nrhs right hand side matrix \(B\).
    [out]XOn exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
    [out]iter
    [out]iter> 0: The number of the iterations the iterative refinement process needed for convergence. < 0: Iterative refinement failed; it falls back to a double precision factorization and solve. -3: single precision matrix was exactly singular in getrf. -(itermax+1): iterative refinement failed to converge in itermax iterations.
    [in]optsAdditional options, as map of name = value pairs. Possible options:
    • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
    • Option::MaxIterations The iteration limit for refinement. Default 30.
    • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
    • Option::Target: Implementation to target. Possible values:
      • HostTask: OpenMP tasks on CPU host [default].
      • HostNest: nested OpenMP parallel for loop on CPU host.
      • HostBatch: batched BLAS on CPU host.
      • Devices: batched BLAS on GPU device.
    • Option::Tolerance: Iterative refinement tolerance. Default epsilon * sqrt(m)
    • Option::MaxIterations: Maximum number of refinement iterations. Default 30
    • Option::UseFallbackSolver: If true and iterative refinement fails to converge, the problem is resolved with partial-pivoted LU. Default true
    Returns
    0: successful exit
    i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, so the solution could not be computed.

◆ gesv_nopiv()

template<typename scalar_t >
int64_t slate::gesv_nopiv ( Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel LU factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.

The LU decomposition with no pivoting

\[ A = L U, \]

where \(L\) is unit lower triangular, and \(U\) is upper triangular. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the n-by-n matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = L U\); the unit diagonal elements of \(L\) are not stored.
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization will have NaN due to division by zero.

◆ gesv_rbt()

template<typename scalar_t >
void slate::gesv_rbt ( Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
Matrix< scalar_t > &  X,
int &  iter,
Options const &  opts 
)

Distributed parallel LU factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.

gesv_rbt first transforms the matrix with random butterfly transforms, factorizes the transformed matrix using getrf_nopiv, and uses this factorization within an iterative refinement procedure to produce a solution with full normwise backward error quality (see below). If the approach fails and the UseFallbackSolver is true, the problem is re-solved with a partial pivoted factorization.

The iterative refinement process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:

  • iter is the number of the current iteration in the iterative refinement process
  • \(\norm{r_j}_{inf}\) is the infinity-norm of the residual, \(r_j = Ax_j - b_j\)
  • \(\norm{x_j}_{inf}\) is the infinity-norm of the solution
  • \(\norm{A}_{inf}\) is the infinity-operator-norm of the matrix \(A\)
    Template Parameters
    scalar_tOne of float, double, std::complex<float>, std::complex<double>.
    Parameters
    [in,out]AOn entry, the n-by-n matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
    [out]pivotsThe pivot indices that define the permutation matrix \(P\).
    [in]BOn entry, the n-by-nrhs right hand side matrix \(B\).
    [out]XOn exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
    [out]iterThe number of the iterations in the iterative refinement process, needed for the convergence. If iterative refinement failed, it is set to -(1+itermax), regardless of whether the fallback solver was used.
    [in]optsAdditional options, as map of name = value pairs. Possible options:
    • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
    • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
    • Option::Target: Implementation to target. Possible values:
      • HostTask: OpenMP tasks on CPU host [default].
      • HostNest: nested OpenMP parallel for loop on CPU host.
      • HostBatch: batched BLAS on CPU host.
      • Devices: batched BLAS on GPU device.
    • Option::Depth: Depth for butterfly transform. Default 2
    • Option::Tolerance: Iterative refinement tolerance. Default epsilon * sqrt(m)
    • Option::MaxIterations: Maximum number of refinement iterations. Default 30
    • Option::UseFallbackSolver: If true and iterative refinement fails to converge, the problem is resolved with partial-pivoted LU. Default true
    TODO: return value
    Return values
    0successful exit
    >0for return value = \(i\), the computed \(U(i,i)\) is exactly zero. The factorization has been completed, but the factor U is exactly singular, so the solution could not be computed.