Factor, forward and back solve. More...

Functions
template<typename scalar_t >
int64_t	slate::gbtrf (BandMatrix< scalar_t > &A, Pivots &pivots, Options const &opts)
	Distributed parallel band LU factorization.

template<typename scalar_t >
void	slate::gbtrs (BandMatrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
	Distributed parallel band LU solve.

Detailed Description

Factor, forward and back solve.

Function Documentation

template<typename scalar_t >

Distributed parallel band LU factorization.

Computes an LU factorization of a general band m-by-n matrix \(A\) using partial pivoting with row interchanges.

The factorization has the form

\[ A = L U \]

where \(L\) is a product of permutation and unit lower triangular matrices, and \(U\) is upper triangular.

This is the right-looking Level 3 BLAS version of the algorithm.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the band matrix \(A\) to be factored. Tiles outside the bandwidth do not need to exist. For tiles that are partially outside the bandwidth, data outside the bandwidth should be explicitly set to zero. On exit, the factors \(L\) and \(U\) from the factorization \(A = L U\); the unit diagonal elements of \(L\) are not stored. The upper bandwidth is increased to accommodate fill-in of \(U\).
[out]	pivots	The pivot indices that define the permutations.
[in]	opts	Additional options, as map of name = value pairs. Possible options:

Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
Option::InnerBlocking: Inner blocking to use for panel. Default 16.
Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
Option::Target: Implementation to target. Possible values:
- HostTask: OpenMP tasks on CPU host [default].
- HostNest: nested OpenMP parallel for loop on CPU host.
- HostBatch: batched BLAS on CPU host.
- Devices: batched BLAS on GPU device.

Option::PivotThreshold: Strictness of the pivot selection. Between 0 and 1 with 1 giving partial pivoting and 0 giving no pivoting. Default 1.

Returns: 0: successful exit; i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, and division by zero will occur if it is used to solve a system of equations.

template<typename scalar_t >

Distributed parallel band LU solve.

Solves a system of linear equations

\[ A X = B \]

with a general n-by-n band matrix \(A\) using the LU factorization computed by gbtrf. \(A\) can be transposed or conjugate-transposed.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	A	The factors \(L\) and \(U\) from the factorization \(A = L U\) as computed by gbtrf.
[in]	pivots	The pivot indices that define the permutations as computed by gbtrf.
[in,out]	B	On entry, the n-by-nrhs right hand side matrix \(B\). On exit, the n-by-nrhs solution matrix \(X\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.