SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Factor, forward and back solve. More...

Functions

template<typename scalar_t >
int64_t slate::gbtrf (BandMatrix< scalar_t > &A, Pivots &pivots, Options const &opts)
 Distributed parallel band LU factorization.
 
template<typename scalar_t >
void slate::gbtrs (BandMatrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
 Distributed parallel band LU solve.
 

Detailed Description

Factor, forward and back solve.

Function Documentation

◆ gbtrf()

template<typename scalar_t >
int64_t slate::gbtrf ( BandMatrix< scalar_t > &  A,
Pivots &  pivots,
Options const &  opts 
)

Distributed parallel band LU factorization.

Computes an LU factorization of a general band m-by-n matrix \(A\) using partial pivoting with row interchanges.

The factorization has the form

\[ A = L U \]

where \(L\) is a product of permutation and unit lower triangular matrices, and \(U\) is upper triangular.

This is the right-looking Level 3 BLAS version of the algorithm.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the band matrix \(A\) to be factored. Tiles outside the bandwidth do not need to exist. For tiles that are partially outside the bandwidth, data outside the bandwidth should be explicitly set to zero. On exit, the factors \(L\) and \(U\) from the factorization \(A = L U\); the unit diagonal elements of \(L\) are not stored. The upper bandwidth is increased to accommodate fill-in of \(U\).
[out]pivotsThe pivot indices that define the permutations.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

Option::PivotThreshold: Strictness of the pivot selection. Between 0 and 1 with 1 giving partial pivoting and 0 giving no pivoting. Default 1.

Returns
0: successful exit
i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, and division by zero will occur if it is used to solve a system of equations.

◆ gbtrs()

template<typename scalar_t >
void slate::gbtrs ( BandMatrix< scalar_t > &  A,
Pivots &  pivots,
Matrix< scalar_t > &  B,
Options const &  opts 
)

Distributed parallel band LU solve.

Solves a system of linear equations

\[ A X = B \]

with a general n-by-n band matrix \(A\) using the LU factorization computed by gbtrf. \(A\) can be transposed or conjugate-transposed.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]AThe factors \(L\) and \(U\) from the factorization \(A = L U\) as computed by gbtrf.
[in]pivotsThe pivot indices that define the permutations as computed by gbtrf.
[in,out]BOn entry, the n-by-nrhs right hand side matrix \(B\). On exit, the n-by-nrhs solution matrix \(X\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.