SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

\(A = U \Sigma V^H\) More...

Functions

template<typename scalar_t >
void slate::svd (Matrix< scalar_t > A, std::vector< blas::real_type< scalar_t > > &Sigma, Matrix< scalar_t > &U, Matrix< scalar_t > &VT, Options const &opts)
 Distributed parallel matrix singular value decomposition.
 

Detailed Description

\(A = U \Sigma V^H\)

Function Documentation

◆ svd()

template<typename scalar_t >
void slate::svd ( Matrix< scalar_t >  A,
std::vector< blas::real_type< scalar_t > > &  Sigma,
Matrix< scalar_t > &  U,
Matrix< scalar_t > &  VT,
Options const &  opts 
)

Distributed parallel matrix singular value decomposition.

Computes all singular values and, optionally, singular vectors of a matrix A. The matrix A is preliminary reduced to bidiagonal form using a two-stage approach: First stage: reduction to upper band bidiagonal form (see ge2tb); Second stage: reduction from band to bidiagonal form (see tb2bd).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]AOn entry, the m-by-n matrix \(A\). On exit, contents are destroyed.
[out]SigmaThe vector Sigma of length min( m, n ). If successful, the singular values in ascending order.
[out]UOn entry, if U is empty, does not compute the left singular vectors. Otherwise, the m-by-min_mn ("economy size") or m-by-m ("all vectors") matrix \(U\) to store the left singular vectors. On exit, the left orthonormal singular vectors of the matrix A.
[out]VTOn entry, if VT is empty, does not compute the right singular vectors. Otherwise, the min_mn-by-n ("economy size") or n-by-n ("all vectors") matrix \(VT\) to store the right singular vectors. On exit, the right orthonormal singular vectors of the matrix A.
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::InnerBlocking: Inner blocking to use for panel. Default 16.
  • Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device. Note A is passed by value, so we can transpose if needed without affecting caller.