SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
|
Solve \(AX = B\). More...
Functions | |
template<typename scalar_t > | |
int64_t | slate::gesv (Matrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts) |
Distributed parallel LU factorization and solve. | |
template<typename scalar_hi , typename scalar_lo > | |
int64_t | slate::gesv_mixed (Matrix< scalar_hi > &A, Pivots &pivots, Matrix< scalar_hi > &B, Matrix< scalar_hi > &X, int &iter, Options const &opts) |
Distributed parallel iterative-refinement LU factorization and solve. | |
template<typename scalar_hi , typename scalar_lo > | |
int64_t | slate::gesv_mixed_gmres (Matrix< scalar_hi > &A, Pivots &pivots, Matrix< scalar_hi > &B, Matrix< scalar_hi > &X, int &iter, Options const &opts) |
Distributed parallel GMRES-IR LU factorization and solve. | |
template<typename scalar_t > | |
int64_t | slate::gesv_nopiv (Matrix< scalar_t > &A, Matrix< scalar_t > &B, Options const &opts) |
Distributed parallel LU factorization and solve. | |
template<typename scalar_t > | |
void | slate::gesv_rbt (Matrix< scalar_t > &A, Matrix< scalar_t > &B, Matrix< scalar_t > &X, int &iter, Options const &opts) |
Distributed parallel LU factorization and solve. | |
Solve \(AX = B\).
int64_t slate::gesv | ( | Matrix< scalar_t > & | A, |
Pivots & | pivots, | ||
Matrix< scalar_t > & | B, | ||
Options const & | opts | ||
) |
Distributed parallel LU factorization and solve.
Computes the solution to a system of linear equations
\[ A X = B, \]
where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor \(A\) as
\[ A = P L U, \]
where \(P\) is a permutation matrix, \(L\) is unit lower triangular, and \(U\) is upper triangular. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).
Complexity (in real): \(\approx \frac{2}{3} n^{3} + 2 n^{2} r\) flops.
scalar_t | One of float, double, std::complex<float>, std::complex<double>. |
[in,out] | A | On entry, the n-by-n matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored. |
[out] | pivots | The pivot indices that define the permutation matrix \(P\). |
[in,out] | B | On entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\). |
[in] | opts | Additional options, as map of name = value pairs. Possible options: |
Option::MethodLU: Algorithm for LU factorization.
int64_t slate::gesv_mixed | ( | Matrix< scalar_hi > & | A, |
Pivots & | pivots, | ||
Matrix< scalar_hi > & | B, | ||
Matrix< scalar_hi > & | X, | ||
int & | iter, | ||
Options const & | opts | ||
) |
Distributed parallel iterative-refinement LU factorization and solve.
Computes the solution to a system of linear equations
\[ A X = B, \]
where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.
gesv_mixed first factorizes the matrix using getrf in low precision (single) and uses this factorization within an iterative refinement procedure to produce a solution with high precision (double) normwise backward error quality (see below). If the approach fails, the method falls back to a high precision (double) factorization and solve.
The iterative refinement is not going to be a winning strategy if the ratio of low-precision performance over high-precision performance is too small. A reasonable strategy should take the number of right-hand sides and the size of the matrix into account. This might be automated in the future. Up to now, we always try iterative refinement.
The iterative refinement process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:
scalar_hi | One of double, std::complex<double>. |
scalar_lo | One of float, std::complex<float>. |
[in,out] | A | On entry, the n-by-n matrix \(A\). On exit, if iterative refinement has been successfully used (return value = 0 and iter >= 0, see description below), then \(A\) is unchanged. If high precision (double) factorization has been used (return value = 0 and iter < 0, see description below), then the array \(A\) contains the factors \(L\) and \(U\) from the factorization \(A = P L U\). |
[out] | pivots | The pivot indices that define the permutation matrix \(P\). |
[in] | B | On entry, the n-by-nrhs right hand side matrix \(B\). |
[out] | X | On exit, if return value = 0, the n-by-nrhs solution matrix \(X\). |
[out] | iter | > 0: The number of the iterations the iterative refinement process needed for convergence. < 0: Iterative refinement failed; it falls back to a double precision factorization and solve. -3: single precision matrix was exactly singular in getrf. -(itermax+1): iterative refinement failed to converge in itermax iterations. |
[in] | opts | Additional options, as map of name = value pairs. Possible options:
|
int64_t slate::gesv_mixed_gmres | ( | Matrix< scalar_hi > & | A, |
Pivots & | pivots, | ||
Matrix< scalar_hi > & | B, | ||
Matrix< scalar_hi > & | X, | ||
int & | iter, | ||
Options const & | opts | ||
) |
Distributed parallel GMRES-IR LU factorization and solve.
Computes the solution to a system of linear equations
\[ A X = B, \]
where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.
gesv_mixed_gmres first factorizes the matrix using getrf in low precision (single) and uses this factorization within a GMRES-IR procedure to produce a solution with high precision (double) normwise backward error quality (see below). If the approach fails, the method falls back to a high precision (double) factorization and solve.
GMRES-IR is not going to be a winning strategy if the ratio of low-precision performance over high-precision performance is too small. A reasonable strategy should take the number of right-hand sides and the the size of the matrix into account. This might be automated in the future. Up to now, we always try iterative refinement.
GMRES-IR process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:
scalar_hi | One of double, std::complex<double>. |
scalar_lo | One of float, std::complex<float>. |
[in,out] | A | On entry, the n-by-n matrix \(A\). On exit, if iterative refinement has been successfully used (return value = 0 and iter >= 0, see description below), then \(A\) is unchanged. If high precision (double) factorization has been used (return value = 0 and iter < 0, see description below), then the array \(A\) contains the factors \(L\) and \(U\) from the factorization \(A = P L U\). |
[out] | pivots | The pivot indices that define the permutation matrix \(P\). |
[in] | B | On entry, the n-by-nrhs right hand side matrix \(B\). |
[out] | X | On exit, if return value = 0, the n-by-nrhs solution matrix \(X\). |
[out] | iter | |
[out] | iter | > 0: The number of the iterations the iterative refinement process needed for convergence. < 0: Iterative refinement failed; it falls back to a double precision factorization and solve. -3: single precision matrix was exactly singular in getrf. -(itermax+1): iterative refinement failed to converge in itermax iterations. |
[in] | opts | Additional options, as map of name = value pairs. Possible options:
|
int64_t slate::gesv_nopiv | ( | Matrix< scalar_t > & | A, |
Matrix< scalar_t > & | B, | ||
Options const & | opts | ||
) |
Distributed parallel LU factorization and solve.
Computes the solution to a system of linear equations
\[ A X = B, \]
where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.
The LU decomposition with no pivoting
\[ A = L U, \]
where \(L\) is unit lower triangular, and \(U\) is upper triangular. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).
scalar_t | One of float, double, std::complex<float>, std::complex<double>. |
[in,out] | A | On entry, the n-by-n matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = L U\); the unit diagonal elements of \(L\) are not stored. |
[in,out] | B | On entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\). |
[in] | opts | Additional options, as map of name = value pairs. Possible options:
|
void slate::gesv_rbt | ( | Matrix< scalar_t > & | A, |
Matrix< scalar_t > & | B, | ||
Matrix< scalar_t > & | X, | ||
int & | iter, | ||
Options const & | opts | ||
) |
Distributed parallel LU factorization and solve.
Computes the solution to a system of linear equations
\[ A X = B, \]
where \(A\) is an n-by-n matrix and \(X\) and \(B\) are n-by-nrhs matrices.
gesv_rbt first transforms the matrix with random butterfly transforms, factorizes the transformed matrix using getrf_nopiv, and uses this factorization within an iterative refinement procedure to produce a solution with full normwise backward error quality (see below). If the approach fails and the UseFallbackSolver is true, the problem is re-solved with a partial pivoted factorization.
The iterative refinement process is stopped if iter > itermax or for all the RHS, \(1 \le j \le nrhs\), we have: \(\norm{r_j}_{inf} < tol \norm{x_j}_{inf} \norm{A}_{inf},\) where:
scalar_t | One of float, double, std::complex<float>, std::complex<double>. |
[in,out] | A | On entry, the n-by-n matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored. |
[out] | pivots | The pivot indices that define the permutation matrix \(P\). |
[in] | B | On entry, the n-by-nrhs right hand side matrix \(B\). |
[out] | X | On exit, if return value = 0, the n-by-nrhs solution matrix \(X\). |
[out] | iter | The number of the iterations in the iterative refinement process, needed for the convergence. If iterative refinement failed, it is set to -(1+itermax), regardless of whether the fallback solver was used. |
[in] | opts | Additional options, as map of name = value pairs. Possible options:
|
0 | successful exit |
>0 | for return value = \(i\), the computed \(U(i,i)\) is exactly zero. The factorization has been completed, but the factor U is exactly singular, so the solution could not be computed. |