Functions
template<typename scalar_t >
void	slate::bdsqr (lapack::Job jobu, lapack::Job jobvt, std::vector< blas::real_type< scalar_t > > &D, std::vector< blas::real_type< scalar_t > > &E, Matrix< scalar_t > &U, Matrix< scalar_t > &VT, Options const &opts)
	Computes the singular values and, optionally, the right and/or left singular vectors from the singular value decomposition (SVD) of a real (upper or lower) bidiagonal matrix.

template<typename scalar_t >
void	slate::ge2tb (Matrix< scalar_t > &A, TriangularFactors< scalar_t > &TU, TriangularFactors< scalar_t > &TV, Options const &opts)
	Distributed parallel reduction to band for 3-stage SVD.

template<typename scalar_t >
void	slate::internal::gerfg (Matrix< scalar_t > &A, int64_t n, scalar_t *v)
	Generates a Householder reflector \(H = I - \tau v v^H\) using the first column of the matrix \(A\), i.e., a reflector that zeroes \(A[1:m-1, 0]\).

template<typename scalar_t >
void	slate::internal::gerf (int64_t n, scalar_t *v, Matrix< scalar_t > &A)
	Applies a Householder reflector \(H = I - \tau v v^H\) to the matrix \(A\) from the left.

template<Target target, typename scalar_t >
void	slate::internal::gebr1 (Matrix< scalar_t > &&A, int64_t n1, scalar_t v1, int64_t n2, scalar_t v2, int priority)
	Implements task 1 in the bidiagonal bulge chasing algorithm.

template<typename scalar_t >
void	slate::internal::gebr1 (internal::TargetType< Target::HostTask >, Matrix< scalar_t > &A, int64_t n1, scalar_t v1, int64_t n2, scalar_t v2, int priority)
	Implements task 1 in the bidiagonal bulge chasing algorithm.

template<Target target, typename scalar_t >
void	slate::internal::gebr2 (int64_t n1, scalar_t v1, Matrix< scalar_t > &&A, int64_t n2, scalar_t v2, int priority)
	Implements task 2 in the bidiagonal bulge chasing algorithm.

template<typename scalar_t >
void	slate::internal::gebr2 (internal::TargetType< Target::HostTask >, int64_t n1, scalar_t v1, Matrix< scalar_t > &A, int64_t n2, scalar_t v2, int priority)
	Implements task 2 in the bidiagonal bulge chasing algorithm.

template<Target target, typename scalar_t >
void	slate::internal::gebr3 (int64_t n1, scalar_t v1, Matrix< scalar_t > &&A, int64_t n2, scalar_t v2, int priority)
	Implements task 3 in the bidiagonal bulge chasing algorithm.

template<typename scalar_t >
void	slate::internal::gebr3 (internal::TargetType< Target::HostTask >, int64_t n1, scalar_t v1, Matrix< scalar_t > &A, int64_t n2, scalar_t v2, int priority)
	Implements task 3 in the bidiagonal bulge chasing algorithm.

template<typename scalar_t >
void	slate::tb2bd (TriangularBandMatrix< scalar_t > &A, Matrix< scalar_t > &U, Matrix< scalar_t > &V, Options const &opts)
	Reduces a band matrix to a bidiagonal matrix using bulge chasing.

template<typename scalar_t >
void	slate::unmbr_ge2tb (Side side, Op op, Matrix< scalar_t > &A, TriangularFactors< scalar_t > T, Matrix< scalar_t > &C, Options const &opts)
	Multiplies the general m-by-n matrix C by Q from `slate::he2hb` as follows:

Detailed Description

Function Documentation

◆ bdsqr()

template<typename scalar_t >

void slate::bdsqr	(	lapack::Job	jobu,
		lapack::Job	jobvt,
		std::vector< blas::real_type< scalar_t > > &	D,
		std::vector< blas::real_type< scalar_t > > &	E,
		Matrix< scalar_t > &	U,
		Matrix< scalar_t > &	VT,
		Options const &	opts
	)

Computes the singular values and, optionally, the right and/or left singular vectors from the singular value decomposition (SVD) of a real (upper or lower) bidiagonal matrix.

Generic implementation for any target.

ATTENTION: only singular values computed for now, no singular vectors. Only host computation supported for now.

◆ ge2tb()

template<typename scalar_t >

void slate::ge2tb	(	Matrix< scalar_t > &	A,
		TriangularFactors< scalar_t > &	TU,
		TriangularFactors< scalar_t > &	TV,
		Options const &	opts
	)

Distributed parallel reduction to band for 3-stage SVD.

Reduces an m-by-n matrix \(A\) to band form using unitary transformations. The factorization has the form

\[ A = U B V^H \]

where \(U\) and \(V\) are unitary. If m >= n, \(B\) is upper triangular band with nb superdiagonals; if m < n, \(B\) is lower triangular band with nb subdiagonals.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	On entry, the m-by-n matrix \(A\). On exit: If m >= n, the elements Aij for j = i, ..., i+nb, represent the upper triangular band matrix B. The elements below the main diagonal, along with TU, represent the unitary matrix \(U\) as a product of elementary reflectors. The elements above the nb-th superdiagonal, along with TV, represent the unitary matrix \(V\) as a product of elementary reflectors. If m < n, the elements Aij for j = i-nb, ..., i, represent the lower triangular band matrix B. The elements below the nb-th subdiagonal, along with TU, represent the unitary matrix \(U\) as a product of elementary reflectors. The elements above the main diagonal, along with TV, represent the unitary matrix \(V\) as a product of elementary reflectors.
[out]	TU	On exit, triangular matrices of the block reflectors for U.
[out]	TV	On exit, triangular matrices of the block reflectors for V.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::InnerBlocking: Inner blocking to use for panel. Default 16. Option::MaxPanelThreads: Number of threads to use for panel. Default omp_get_max_threads()/2. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device. Note a lookahead is not possible with ge2tb due to dependencies from updating on both left and right sides.

◆ gebr1() [1/2]

template<typename scalar_t >

void slate::internal::gebr1	(	internal::TargetType< Target::HostTask >	,
		Matrix< scalar_t > &	A,
		int64_t	n1,
		scalar_t *	v1,
		int64_t	n2,
		scalar_t *	v2,
		int	priority
	)

Implements task 1 in the bidiagonal bulge chasing algorithm.

(see https://doi.org/10.1137/17M1117732 and http://www.icl.utk.edu/publications/swan-013)

Parameters

[in,out]	A	The first block of a sweep.
[out]	v1	The Householder reflector to zero A[0, 1:n-1].
[out]	v2	The Householder reflector to zero A[2:m-1, 0].

◆ gebr1() [2/2]

template<Target target, typename scalar_t >

void slate::internal::gebr1	(	Matrix< scalar_t > &&	A,
		int64_t	n1,
		scalar_t *	v1,
		int64_t	n2,
		scalar_t *	v2,
		int	priority
	)

Implements task 1 in the bidiagonal bulge chasing algorithm.

Dispatches to target implementations.

◆ gebr2() [1/2]

template<Target target, typename scalar_t >

void slate::internal::gebr2	(	int64_t	n1,
		scalar_t *	v1,
		Matrix< scalar_t > &&	A,
		int64_t	n2,
		scalar_t *	v2,
		int	priority
	)

Implements task 2 in the bidiagonal bulge chasing algorithm.

Dispatches to target implementations.

◆ gebr2() [2/2]

template<typename scalar_t >

void slate::internal::gebr2	(	internal::TargetType< Target::HostTask >	,
		int64_t	n1,
		scalar_t *	v1,
		Matrix< scalar_t > &	A,
		int64_t	n2,
		scalar_t *	v2,
		int	priority
	)

Implements task 2 in the bidiagonal bulge chasing algorithm.

Parameters

[in]	v1	The second Householder reflector produced by task 1.
[in,out]	A	An off-diagonal block in a sweep.
[out]	v2	The Householder reflector to zero A[0, 1:n-1].

◆ gebr3() [1/2]

template<Target target, typename scalar_t >

void slate::internal::gebr3	(	int64_t	n1,
		scalar_t *	v1,
		Matrix< scalar_t > &&	A,
		int64_t	n2,
		scalar_t *	v2,
		int	priority
	)

Implements task 3 in the bidiagonal bulge chasing algorithm.

Dispatches to target implementations.

◆ gebr3() [2/2]

template<typename scalar_t >

void slate::internal::gebr3	(	internal::TargetType< Target::HostTask >	,
		int64_t	n1,
		scalar_t *	v1,
		Matrix< scalar_t > &	A,
		int64_t	n2,
		scalar_t *	v2,
		int	priority
	)

Implements task 3 in the bidiagonal bulge chasing algorithm.

Parameters

[in]	v1	The Householder reflector produced by task 2.
[in,out]	A	A diagonal block in a sweep.
[out]	v2	The Householder reflector to zero A[1:m-1, 0].

◆ gerf()

template<typename scalar_t >

void slate::internal::gerf	(	int64_t	n,
		scalar_t *	v,
		Matrix< scalar_t > &	A
	)

Applies a Householder reflector \(H = I - \tau v v^H\) to the matrix \(A\) from the left.

Takes the \(\tau\) factor from \(v[0]\).

Parameters

[in]	n	Length of vector v.
[in]	v	The vector v in the representation of H. Modified but restored on exit.
[in,out]	A	The m-by-n matrix A.

◆ gerfg()

template<typename scalar_t >

void slate::internal::gerfg	(	Matrix< scalar_t > &	A,
		int64_t	n,
		scalar_t *	v
	)

Generates a Householder reflector \(H = I - \tau v v^H\) using the first column of the matrix \(A\), i.e., a reflector that zeroes \(A[1:m-1, 0]\).

Stores \(\tau\) in \(v[0]\).

Parameters

[in]	A	The m-by-n matrix A.
[out]	v	The vector v in the representation of H.

◆ tb2bd()

template<typename scalar_t >

void slate::tb2bd	(	TriangularBandMatrix< scalar_t > &	A,
		Matrix< scalar_t > &	U,
		Matrix< scalar_t > &	V,
		Options const &	opts
	)

Reduces a band matrix to a bidiagonal matrix using bulge chasing.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in,out]	A	The band matrix A.
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

◆ unmbr_ge2tb()

template<typename scalar_t >

void slate::unmbr_ge2tb	(	Side	side,
		Op	op,
		Matrix< scalar_t > &	A,
		TriangularFactors< scalar_t >	T,
		Matrix< scalar_t > &	C,
		Options const &	opts
	)

Multiplies the general m-by-n matrix C by Q from slate::he2hb as follows:

op	side = Left	side = Right
op = NoTrans	\(Q C \)	\(C Q \)
op = ConjTrans	\(Q^H C\)	\(C Q^H\)

where \(Q\) is a unitary matrix defined as the product of k elementary reflectors

\[ Q = H(1) H(2) . . . H(k) \]

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	side	Side::Left: apply \(Q\) or \(Q^H\) from the left; Side::Right: apply \(Q\) or \(Q^H\) from the right.
[in]	op	Op::NoTrans apply \(Q\); Op::ConjTrans: apply \(Q^H\); Op::Trans: apply \(Q^T\) (only if real). In the real case, Op::Trans is equivalent to Op::ConjTrans. In the complex case, Op::Trans is not allowed.
[in]	A	On entry, the n-by-n Hermitian matrix \(A\), as returned by `slate::he2hb`.
[in]	T	On entry, triangular matrices of the elementary reflector H(i), as returned by `slate::he2hb`.
[in,out]	C	On entry, the m-by-n matrix \(C\). On exit, \(C\) is overwritten by \(Q C\), \(Q^H C\), \(C Q\), or \(C Q^H\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

Functions

Detailed Description

Function Documentation

◆ bdsqr()

◆ ge2tb()

◆ gebr1() [1/2]

◆ gebr1() [2/2]

◆ gebr2() [1/2]

◆ gebr2() [2/2]

◆ gebr3() [1/2]

◆ gebr3() [2/2]

◆ gerf()

◆ gerfg()

◆ tb2bd()

◆ unmbr_ge2tb()