SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Solve \(AX = B\). More...

Functions

template<typename scalar_t >
int64_t slate::gbsv (BandMatrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel band LU factorization and solve.
 

Detailed Description

Solve \(AX = B\).

Function Documentation

◆ gbsv()

template<typename scalar_t >
int64_t slate::gbsv ( BandMatrix< scalar_t > &  A,
Pivots &  pivots,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel band LU factorization and solve.

Computes the solution to a system of linear equations

\[ A X = B, \]

where \(A\) is an n-by-n band matrix and \(X\) and \(B\) are n-by-nrhs matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor \(A\) as

\[ A = L U, \]

where \(L\) is a product of permutation and unit lower triangular matrices, and \(U\) is upper triangular. The factored form of \(A\) is then used to solve the system of equations \(A X = B\).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the n-by-n band matrix \(A\) to be factored. Tiles outside the bandwidth do not need to exist. For tiles that are partially outside the bandwidth, data outside the bandwidth should be explicitly set to zero. On exit, the factors \(L\) and \(U\) from the factorization \(A = L U\); the unit diagonal elements of \(L\) are not stored. The upper bandwidth is increased to accommodate fill-in of \(U\).
[out]pivotsThe pivot indices that define the permutation matrix \(P\).
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, if return value = 0, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.
Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, so the solution could not be computed.