SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Functions

template<typename scalar_t >
void slate::bdsqr (lapack::Job jobu, lapack::Job jobvt, std::vector< blas::real_type< scalar_t > > &D, std::vector< blas::real_type< scalar_t > > &E, Matrix< scalar_t > &U, Matrix< scalar_t > &VT, Options const &opts)
 Computes the singular values and, optionally, the right and/or left singular vectors from the singular value decomposition (SVD) of a real (upper or lower) bidiagonal matrix.
 
template<typename scalar_t >
void slate::ge2tb (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &TU, TriangularFactors< scalar_t > &TV, Options const &opts)
 Distributed parallel reduction to band for 3-stage SVD.
 
template<typename scalar_t >
void slate::internal::gerfg (Matrix< scalar_t > &A, int64_t n, scalar_t *v)
 Generates a Householder reflector \(H = I - \tau v v^H\) using the first column of the matrix \(A\), i.e., a reflector that zeroes \(A[1:m-1, 0]\).
 
template<typename scalar_t >
void slate::internal::gerf (int64_t n, scalar_t *v, Matrix< scalar_t > &A)
 Applies a Householder reflector \(H = I - \tau v v^H\) to the matrix \(A\) from the left.
 
template<Target target, typename scalar_t >
void slate::internal::gebr1 (Matrix< scalar_t > &&A, int64_t n1, scalar_t *v1, int64_t n2, scalar_t *v2, int priority)
 Implements task 1 in the bidiagonal bulge chasing algorithm.
 
template<typename scalar_t >
void slate::internal::gebr1 (internal::TargetType< Target::HostTask >, Matrix< scalar_t > &A, int64_t n1, scalar_t *v1, int64_t n2, scalar_t *v2, int priority)
 Implements task 1 in the bidiagonal bulge chasing algorithm.
 
template<Target target, typename scalar_t >
void slate::internal::gebr2 (int64_t n1, scalar_t *v1, Matrix< scalar_t > &&A, int64_t n2, scalar_t *v2, int priority)
 Implements task 2 in the bidiagonal bulge chasing algorithm.
 
template<typename scalar_t >
void slate::internal::gebr2 (internal::TargetType< Target::HostTask >, int64_t n1, scalar_t *v1, Matrix< scalar_t > &A, int64_t n2, scalar_t *v2, int priority)
 Implements task 2 in the bidiagonal bulge chasing algorithm.
 
template<Target target, typename scalar_t >
void slate::internal::gebr3 (int64_t n1, scalar_t *v1, Matrix< scalar_t > &&A, int64_t n2, scalar_t *v2, int priority)
 Implements task 3 in the bidiagonal bulge chasing algorithm.
 
template<typename scalar_t >
void slate::internal::gebr3 (internal::TargetType< Target::HostTask >, int64_t n1, scalar_t *v1, Matrix< scalar_t > &A, int64_t n2, scalar_t *v2, int priority)
 Implements task 3 in the bidiagonal bulge chasing algorithm.
 
template<typename scalar_t >
void slate::tb2bd (TriangularBandMatrix< scalar_t > &A, Matrix< scalar_t > &U, Matrix< scalar_t > &V, Options const &opts)
 Reduces a band matrix to a bidiagonal matrix using bulge chasing.
 
template<typename scalar_t >
void slate::unmbr_ge2tb (Side side, Op op, Matrix< scalar_t > &A, TriangularFactors< scalar_t > T, Matrix< scalar_t > &C, Options const &opts)
 Multiplies the general m-by-n matrix C by Q from slate::he2hb as follows:
 

Detailed Description

Function Documentation

◆ bdsqr()

template<typename scalar_t >
void slate::bdsqr ( lapack::Job  jobu,
lapack::Job  jobvt,
std::vector< blas::real_type< scalar_t > > &  D,
std::vector< blas::real_type< scalar_t > > &  E,
Matrix< scalar_t > &  U,
Matrix< scalar_t > &  VT,
Options const &  opts 
)

Computes the singular values and, optionally, the right and/or left singular vectors from the singular value decomposition (SVD) of a real (upper or lower) bidiagonal matrix.

Generic implementation for any target.

ATTENTION: only singular values computed for now, no singular vectors. Only host computation supported for now.

◆ ge2tb()

template<typename scalar_t >
void slate::ge2tb ( Matrix< scalar_t > &  A,
TriangularFactors< scalar_t > &  TU,
TriangularFactors< scalar_t > &  TV,
Options const &  opts 
)

Distributed parallel reduction to band for 3-stage SVD.

Reduces an m-by-n matrix \(A\) to band form using unitary transformations. The factorization has the form

\[ A = U B V^H \]

where \(U\) and \(V\) are unitary. If m >= n, \(B\) is upper triangular band with nb superdiagonals; if m < n, \(B\) is lower triangular band with nb subdiagonals.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AOn entry, the m-by-n matrix \(A\). On exit:
  • If m >= n, the elements Aij for j = i, ..., i+nb, represent the upper triangular band matrix B. The elements below the main diagonal, along with TU, represent the unitary matrix \(U\) as a product of elementary reflectors. The elements above the nb-th superdiagonal, along with TV, represent the unitary matrix \(V\) as a product of elementary reflectors.
  • If m < n, the elements Aij for j = i-nb, ..., i, represent the lower triangular band matrix B. The elements below the nb-th subdiagonal, along with TU, represent the unitary matrix \(U\) as a product of elementary reflectors. The elements above the main diagonal, along with TV, represent the unitary matrix \(V\) as a product of elementary reflectors.
[out]TUOn exit, triangular matrices of the block reflectors for U.
[out]TVOn exit, triangular matrices of the block reflectors for V.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device. Note a lookahead is not possible with ge2tb due to dependencies from updating on both left and right sides.

◆ gebr1() [1/2]

template<typename scalar_t >
void slate::internal::gebr1 ( internal::TargetType< Target::HostTask ,
Matrix< scalar_t > &  A,
int64_t  n1,
scalar_t *  v1,
int64_t  n2,
scalar_t *  v2,
int  priority 
)

Implements task 1 in the bidiagonal bulge chasing algorithm.

(see https://doi.org/10.1137/17M1117732 and http://www.icl.utk.edu/publications/swan-013)

Parameters
[in,out]AThe first block of a sweep.
[out]v1The Householder reflector to zero A[0, 1:n-1].
[out]v2The Householder reflector to zero A[2:m-1, 0].

◆ gebr1() [2/2]

template<Target target, typename scalar_t >
void slate::internal::gebr1 ( Matrix< scalar_t > &&  A,
int64_t  n1,
scalar_t *  v1,
int64_t  n2,
scalar_t *  v2,
int  priority 
)

Implements task 1 in the bidiagonal bulge chasing algorithm.

Dispatches to target implementations.

◆ gebr2() [1/2]

template<Target target, typename scalar_t >
void slate::internal::gebr2 ( int64_t  n1,
scalar_t *  v1,
Matrix< scalar_t > &&  A,
int64_t  n2,
scalar_t *  v2,
int  priority 
)

Implements task 2 in the bidiagonal bulge chasing algorithm.

Dispatches to target implementations.

◆ gebr2() [2/2]

template<typename scalar_t >
void slate::internal::gebr2 ( internal::TargetType< Target::HostTask ,
int64_t  n1,
scalar_t *  v1,
Matrix< scalar_t > &  A,
int64_t  n2,
scalar_t *  v2,
int  priority 
)

Implements task 2 in the bidiagonal bulge chasing algorithm.

Parameters
[in]v1The second Householder reflector produced by task 1.
[in,out]AAn off-diagonal block in a sweep.
[out]v2The Householder reflector to zero A[0, 1:n-1].

◆ gebr3() [1/2]

template<Target target, typename scalar_t >
void slate::internal::gebr3 ( int64_t  n1,
scalar_t *  v1,
Matrix< scalar_t > &&  A,
int64_t  n2,
scalar_t *  v2,
int  priority 
)

Implements task 3 in the bidiagonal bulge chasing algorithm.

Dispatches to target implementations.

◆ gebr3() [2/2]

template<typename scalar_t >
void slate::internal::gebr3 ( internal::TargetType< Target::HostTask ,
int64_t  n1,
scalar_t *  v1,
Matrix< scalar_t > &  A,
int64_t  n2,
scalar_t *  v2,
int  priority 
)

Implements task 3 in the bidiagonal bulge chasing algorithm.

Parameters
[in]v1The Householder reflector produced by task 2.
[in,out]AA diagonal block in a sweep.
[out]v2The Householder reflector to zero A[1:m-1, 0].

◆ gerf()

template<typename scalar_t >
void slate::internal::gerf ( int64_t  n,
scalar_t *  v,
Matrix< scalar_t > &  A 
)

Applies a Householder reflector \(H = I - \tau v v^H\) to the matrix \(A\) from the left.

Takes the \(\tau\) factor from \(v[0]\).

Parameters
[in]nLength of vector v.
[in]vThe vector v in the representation of H. Modified but restored on exit.
[in,out]AThe m-by-n matrix A.

◆ gerfg()

template<typename scalar_t >
void slate::internal::gerfg ( Matrix< scalar_t > &  A,
int64_t  n,
scalar_t *  v 
)

Generates a Householder reflector \(H = I - \tau v v^H\) using the first column of the matrix \(A\), i.e., a reflector that zeroes \(A[1:m-1, 0]\).

Stores \(\tau\) in \(v[0]\).

Parameters
[in]AThe m-by-n matrix A.
[out]vThe vector v in the representation of H.

◆ tb2bd()

template<typename scalar_t >
void slate::tb2bd ( TriangularBandMatrix< scalar_t > &  A,
Matrix< scalar_t > &  U,
Matrix< scalar_t > &  V,
Options const &  opts 
)

Reduces a band matrix to a bidiagonal matrix using bulge chasing.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in,out]AThe band matrix A.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ unmbr_ge2tb()

template<typename scalar_t >
void slate::unmbr_ge2tb ( Side  side,
Op  op,
Matrix< scalar_t > &  A,
TriangularFactors< scalar_t >  T,
Matrix< scalar_t > &  C,
Options const &  opts 
)

Multiplies the general m-by-n matrix C by Q from slate::he2hb as follows:

op side = Left side = Right
op = NoTrans \(Q C \) \(C Q \)
op = ConjTrans \(Q^H C\) \(C Q^H\)

where \(Q\) is a unitary matrix defined as the product of k elementary reflectors

\[ Q = H(1) H(2) . . . H(k) \]

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]side
  • Side::Left: apply \(Q\) or \(Q^H\) from the left;
  • Side::Right: apply \(Q\) or \(Q^H\) from the right.
[in]op
  • Op::NoTrans apply \(Q\);
  • Op::ConjTrans: apply \(Q^H\);
  • Op::Trans: apply \(Q^T\) (only if real). In the real case, Op::Trans is equivalent to Op::ConjTrans. In the complex case, Op::Trans is not allowed.
[in]AOn entry, the n-by-n Hermitian matrix \(A\), as returned by slate::he2hb.
[in]TOn entry, triangular matrices of the elementary reflector H(i), as returned by slate::he2hb.
[in,out]COn entry, the m-by-n matrix \(C\). On exit, \(C\) is overwritten by \(Q C\), \(Q^H C\), \(C Q\), or \(C Q^H\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.