Factor, forward and back solve, invert. More...

Functions
template<typename scalar_t >
void	slate::gerbt (Matrix< scalar_t > &U_in, Matrix< scalar_t > &A, Matrix< scalar_t > &V)
	Applies a 2-sided RBT to the given matrix.

template<typename scalar_t >
void	slate::gerbt (Matrix< scalar_t > &Uin, Matrix< scalar_t > &B)
	Applies a 1-sided RBT to the given matrix on the left.

template<typename scalar_t >
int64_t	slate::getrf (Matrix< scalar_t > &A, Pivots &pivots, Options const &opts)
	Distributed parallel LU factorization.

template<typename scalar_t >
int64_t	slate::getrf_nopiv (Matrix< scalar_t > &A, Options const &opts)
	Distributed parallel LU factorization without pivoting.

template<typename scalar_t >
int64_t	slate::getrf_tntpiv (Matrix< scalar_t > &A, Pivots &pivots, Options const &opts)
	Distributed parallel LU factorization.

template<typename scalar_t >
void	slate::getri (Matrix< scalar_t > &A, Pivots &pivots, Options const &opts)
	Distributed parallel LU inversion.

template<typename scalar_t >
void	slate::getri (Matrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
	Distributed parallel LU inversion (out-of-place version).

template<typename scalar_t >
void	slate::getrs (Matrix< scalar_t > &A, Pivots &pivots, Matrix< scalar_t > &B, Options const &opts)
	Distributed parallel LU solve.

template<typename scalar_t >
void	slate::getrs_nopiv (Matrix< scalar_t > &A, Matrix< scalar_t > &B, Options const &opts=Options())

Detailed Description

Factor, forward and back solve, invert.

Function Documentation

◆ gerbt() [1/2]

template<typename scalar_t >

void slate::gerbt	(	Matrix< scalar_t > &	U_in,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	V
	)

Applies a 2-sided RBT to the given matrix.

Parameters

[in]	U_in	The left transform in packed storage. Should be transposed.
[in,out]	A	The matrix to transform
[in]	V	The right transform in packed storage. Should not be transposed.

◆ gerbt() [2/2]

template<typename scalar_t >

void slate::gerbt	(	Matrix< scalar_t > &	Uin,
		Matrix< scalar_t > &	B
	)

Applies a 1-sided RBT to the given matrix on the left.

Parameters

[in]	U_in	The transform in packed storage
[in,out]	A	The matrix to transform

◆ getrf()

template<typename scalar_t >

int64_t slate::getrf	(	Matrix< scalar_t > &	A,
		Pivots &	pivots,
		Options const &	opts
	)

Distributed parallel LU factorization.

Computes an LU factorization of a general m-by-n matrix \(A\) using partial pivoting with row interchanges.

The factorization has the form

\[ A = P L U \]

where \(P\) is a permutation matrix, \(L\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U\) is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

Complexity (in real):

\(\approx m n^{2} - \frac{1}{3} n^{3}\) flops;
\(\approx \frac{2}{3} n^{3}\) flops for \(m = n\).

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[out]	pivots	The pivot indices that define the permutation matrix \(P\).
[in]	opts	Additional options, as map of name = value pairs. Possible options:

Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
Option::InnerBlocking: Inner blocking to use for panel. Default 16.
Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
Option::Target: Implementation to target. Possible values:
- HostTask: OpenMP tasks on CPU host [default].
- HostNest: nested OpenMP parallel for loop on CPU host.
- HostBatch: batched BLAS on CPU host.
- Devices: batched BLAS on GPU device.
Option::PivotThreshold: Strictness of the pivot selection. Between 0 and 1 with 1 giving partial pivoting and 0 giving no pivoting. Default 1.
Option::MethodLU: Algorithm for LU factorization.
- MethodLU::PartialPiv: partial pivoting [default].
- MethodLU::CALU: communication avoiding (tournament pivoting).
- MethodLU::NoPiv: no pivoting. Note pivots vector is currently ignored for NoPiv.

Returns: 0: successful exit; i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, and division by zero will occur if it is used to solve a system of equations.

◆ getrf_nopiv()

template<typename scalar_t >

int64_t slate::getrf_nopiv	(	Matrix< scalar_t > &	A,
		Options const &	opts
	)

Distributed parallel LU factorization without pivoting.

Computes an LU factorization without pivoting of a general m-by-n matrix \(A\)

The factorization has the form

\[ A = L U \]

where \(L\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U\) is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

Returns: 0: successful exit; i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization will have NaN due to division by zero.

◆ getrf_tntpiv()

template<typename scalar_t >

int64_t slate::getrf_tntpiv	(	Matrix< scalar_t > &	A,
		Pivots &	pivots,
		Options const &	opts
	)

Distributed parallel LU factorization.

Computes an LU factorization of a general m-by-n matrix \(A\) using partial pivoting with row interchanges.

The factorization has the form

\[ A = P L U \]

where \(P\) is a permutation matrix, \(L\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U\) is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the matrix \(A\) to be factored. On exit, the factors \(L\) and \(U\) from the factorization \(A = P L U\); the unit diagonal elements of \(L\) are not stored.
[out]	pivots	The pivot indices that define the permutation matrix \(P\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

Returns: 0: successful exit; i > 0: \(U(i,i)\) is exactly zero, where \(i\) is a 1-based index. The factorization has been completed, but the factor \(U\) is exactly singular, and division by zero will occur if it is used to solve a system of equations.

◆ getri() [1/2]

template<typename scalar_t >

void slate::getri	(	Matrix< scalar_t > &	A,
		Pivots &	pivots,
		Matrix< scalar_t > &	B,
		Options const &	opts
	)

Distributed parallel LU inversion (out-of-place version).

Computes the inverse of a matrix \(A\) using the LU factorization \(A = L*U\) computed by getrf. Stores the result in \(B\). Does not change \(A\).

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	A	On entry, the factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf. On exit, the inverse of the original matrix \(A\).
[in]	pivots	The pivot indices that define the permutation matrix \(P\) as computed by getrf.
[out]	B	On exit, if return value = 0, the n-by-n inverse of matrix \(A\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

◆ getri() [2/2]

template<typename scalar_t >

void slate::getri	(	Matrix< scalar_t > &	A,
		Pivots &	pivots,
		Options const &	opts
	)

Distributed parallel LU inversion.

Computes the inverse of a matrix \(A\) using the LU factorization \(A = L*U\) computed by getrf.

Complexity (in real): \(\approx \frac{4}{3} n^{3}\) flops.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	A	On entry, the factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf. On exit, the inverse of the original matrix \(A\).
[in]	pivots	The pivot indices that define the permutation matrix \(P\) as computed by getrf.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

◆ getrs()

template<typename scalar_t >

void slate::getrs	(	Matrix< scalar_t > &	A,
		Pivots &	pivots,
		Matrix< scalar_t > &	B,
		Options const &	opts
	)

Distributed parallel LU solve.

Solves a system of linear equations

\[ A X = B \]

with a general n-by-n matrix \(A\) using the LU factorization computed by getrf. \(A\) can be transposed or conjugate-transposed.

Complexity (in real): \(\approx 2 n^{2} r\) flops.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	A	The factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf.
[in]	pivots	The pivot indices that define the permutation matrix \(P\) as computed by gbtrf.
[in,out]	B	On entry, the n-by-nrhs right hand side matrix \(B\). On exit, the n-by-nrhs solution matrix \(X\).
[in]	opts	Additional options, as map of name = value pairs. Possible options:

Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1.
Option::Target: Implementation to target. Possible values:
- HostTask: OpenMP tasks on CPU host [default].
- HostNest: nested OpenMP parallel for loop on CPU host.
- HostBatch: batched BLAS on CPU host.
- Devices: batched BLAS on GPU device.

Option::MethodLU: Algorithm for LU factorization.

MethodLU::PartialPiv: partial pivoting [default].
MethodLU::CALU: communication avoiding (tournament pivoting).
MethodLU::NoPiv: no pivoting. Note pivots vector is currently ignored for NoPiv.

◆ getrs_nopiv()

template<typename scalar_t >

void slate::getrs_nopiv	(	Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		Options const &	opts
	)

Deprecated:: Use getrs( A, pivots, B, { Option::MethodLU, MethodLU::NoPiv } ).

Distributed parallel LU solve.

Solves a system of linear equations

\[ A X = B \]

with a general n-by-n matrix \(A\) using the LU factorization computed by getrf. \(A\) can be transposed or conjugate-transposed.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	A	The factors \(L\) and \(U\) from the factorization \(A = P L U\) as computed by getrf.
[in,out]	B	On entry, the n-by-nrhs right hand side matrix \(B\). On exit, the n-by-nrhs solution matrix \(X\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of panels to overlap with matrix updates. lookahead >= 0. Default 1. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

Functions

Detailed Description

Function Documentation

◆ gerbt() [1/2]

◆ gerbt() [2/2]

◆ getrf()

◆ getrf_nopiv()

◆ getrf_tntpiv()

◆ getri() [1/2]

◆ getri() [2/2]

◆ getrs()

◆ getrs_nopiv()