Functions
template<typename scalar_t >
void	slate::tile::geqrf (int64_t ib, std::vector< Tile< scalar_t > > &tiles, std::vector< int64_t > &tile_indices, Tile< scalar_t > &T, int thread_rank, int thread_size, ThreadBarrier &thread_barrier, std::vector< blas::real_type< scalar_t > > &scale, std::vector< blas::real_type< scalar_t > > &sumsq, blas::real_type< scalar_t > &xnorm, std::vector< std::vector< scalar_t > > &W)
	Compute the QR factorization of a panel.

template<typename scalar_t >
void	slate::tile::householder_reflection_generator (std::vector< Tile< scalar_t > > &tiles, std::vector< int64_t > &tile_indices, Tile< scalar_t > &T, int thread_rank, int thread_size, ThreadBarrier &thread_barrier, std::vector< blas::real_type< scalar_t > > &scale, std::vector< blas::real_type< scalar_t > > &sumsq, blas::real_type< scalar_t > &xnorm, std::vector< std::vector< scalar_t > > &W)
	Compute at the qr2 level, Householder reflections of a panel, with tau scalars.

template<typename scalar_t >
void	slate::tile::tpmqrt (Side side, Op op, int64_t l, Tile< scalar_t > V2, Tile< scalar_t > T, Tile< scalar_t > C1, Tile< scalar_t > C2)
	Multiply the matrix C by the unitary matrix Q obtained from a "triangular-pentagonal" block reflector H.

template<typename scalar_t >
void	slate::tile::tpqrt (int64_t l, Tile< scalar_t > A1, Tile< scalar_t > A2, Tile< scalar_t > T)
	Compute the triangular-pentagonal QR factorization of 2 tiles, A1 and A2.

Detailed Description

Function Documentation

◆ geqrf()

template<typename scalar_t >

void slate::tile::geqrf	(	int64_t	ib,
		std::vector< Tile< scalar_t > > &	tiles,
		std::vector< int64_t > &	tile_indices,
		Tile< scalar_t > &	T,
		int	thread_rank,
		int	thread_size,
		ThreadBarrier &	thread_barrier,
		std::vector< blas::real_type< scalar_t > > &	scale,
		std::vector< blas::real_type< scalar_t > > &	sumsq,
		blas::real_type< scalar_t > &	xnorm,
		std::vector< std::vector< scalar_t > > &	W
	)

Compute the QR factorization of a panel.

Parameters

[in]	ib	internal blocking in the panel
[in,out]	tiles	local tiles in the panel
[in]	tile_indices	i indices of the tiles in the panel
[out]	T	upper triangular factor of the block reflector
[in]	thread_rank	rank of this thread
[in]	thread_size	number of local threads
[in]	thread_barrier	barrier for synchronizing local threads
[out]	scale	component of lassq to compute Householder norm in a stable manner
[out]	sumsq	component of lassq to compute Householder norm in a stable manner
[out]	xnorm	Householder norm
[out]	W

◆ householder_reflection_generator()

template<typename scalar_t >

void slate::tile::householder_reflection_generator	(	std::vector< Tile< scalar_t > > &	tiles,
		std::vector< int64_t > &	tile_indices,
		Tile< scalar_t > &	T,
		int	thread_rank,
		int	thread_size,
		ThreadBarrier &	thread_barrier,
		std::vector< blas::real_type< scalar_t > > &	scale,
		std::vector< blas::real_type< scalar_t > > &	sumsq,
		blas::real_type< scalar_t > &	xnorm,
		std::vector< std::vector< scalar_t > > &	W
	)

Compute at the qr2 level, Householder reflections of a panel, with tau scalars.

Additionally, one can: Compute blocking factor T column-by-column Compute trailing matrix update at qr2 level (for a full factorization) by manually changing #ifdef specific to desired computation We care for two reasons: 1) Performance comparison, evaluate panel performance without trailing matrix update and/or constructing T 2) As a random orthogonal matrix generator code (no need to update)

Parameters

[in,out]	tiles	local tiles in the panel
[in]	tile_indices	i indices of the tiles in the panel
[out]	T	upper triangular factor of the block reflector
[in]	thread_rank	rank of this thread
[in]	thread_size	number of local threads
[in]	thread_barrier	barrier for synchronizing local threads

todo: add missing params

◆ tpmqrt()

template<typename scalar_t >

void slate::tile::tpmqrt	(	Side	side,
		Op	op,
		int64_t	l,
		Tile< scalar_t >	V2,
		Tile< scalar_t >	T,
		Tile< scalar_t >	C1,
		Tile< scalar_t >	C2
	)

Multiply the matrix C by the unitary matrix Q obtained from a "triangular-pentagonal" block reflector H.

C consists of two tiles, C1 and C2.

If side == Left:

[ I  ] <== k-by-k    C = [ C1 ]  <== k-by-n
[ V2 ] <== m-by-k        [ C2 ]  <== m-by-n

and on exit, \(C = op(Q) C\). l, k, m are the same in tpqrt; n here is different.

If side == Right:

C = [ C1  C2 ]      [ I  ] <== k-by-k
  m-by-k  m-by-n    [ V2 ] <== n-by-k

and on exit, \(C = C op(Q)\). l, k are the same in tpqrt; n = tpqrt's m; m here is different.

Q is a product of block reflectors,

Q = \prod_{j = 1, ..., r} I - Vj Tj Vj^H

where r is the number of blocks, Tj is the j-th block of T, and Vj is the j-th block column of V, with internal blocking size ib.

See Further Details in tpqrt.

Parameters

[in]	side	Side::Left: Multiply from the left: \(C = op(Q) C\). Side::Right: Multiply from the right: \(C = C op(Q)\).
[in]	op	Op::NoTrans: Multiply by \(op(Q) = Q\). Op::Trans: Multiply by \(op(Q) = Q^T\) (only in real case). Op::ConjTrans: Multiply by \(op(Q) = Q^H\).
[in]	l	The number of rows of the upper trapezoidal part of V2. If side = left, min( m, k ) >= l >= 0. If side = right, min( n, k ) >= l >= 0.
[in]	V2	The M-by-k, upper pentagonal matrix V2, in an M2-by-k tile, where M2 >= M. If side == Left, M = m. If side == Right, M = n. The i-th column must contain the vector which defines the elementary reflector H(i), for i = 1, 2, ..., k, as returned by tpqrt in A2. The top (M-l)-by-k portion is rectangular, the bottom l-by-k portion is upper trapezoidal. See Further Details in tpqrt.
[in]	T	The upper triangular factors of the block reflectors as returned by tpqrt, stored as an ib-by-k tile.
[in,out]	C1	If side == Left, the k-by-n matrix C1, in a k1-by-n tile, k1 >= k. If side == Right, the m-by-k matrix C1, in an m-by-k1 tile, k1 >= k. On exit, C1 is overwritten by the corresponding block of \(op(Q) C\) or \(C op(Q)\).
[in,out]	C2	The m-by-n matrix C2. If side == Left, C2 is in an m2-by-n tile, where m2 >= m. If side == Right, C2 is in an m-by-n2 tile, where n2 >= n. On exit, C2 is overwritten by the corresponding block of \(op(Q) C\) or \(C op(Q)\).

Note: compared to LAPACK, A is renamed here => C1, B => C2, V => V2.

◆ tpqrt()

template<typename scalar_t >

void slate::tile::tpqrt	(	int64_t	l,
		Tile< scalar_t >	A1,
		Tile< scalar_t >	A2,
		Tile< scalar_t >	T
	)

Compute the triangular-pentagonal QR factorization of 2 tiles, A1 and A2.

On exit, the pentagonal tile A2 has been eliminated.

Parameters

[in]	l	The number of rows of the upper trapezoidal part of A2. min( m, k ) >= l >= 0. See Further Details.
[in,out]	A1	On entry, the k-by-k upper triangular matrix A1, in a k1-by-k tile, where k1 >= k.
[in,out]	A2	On entry, the m-by-k upper pentagonal matrix A2, in an m2-by-k tile, where m2 >= m. On exit, the columns represent the Householder reflectors. The top (m-l)-by-k portion is rectangular, the bottom l-by-k portion is upper trapezoidal.
[out]	T	Array of size ib-by-k, where ib is the internal blocking to use, 1 <= ib <= k, in an ib3-by-k3 tile, ib3 >= ib and k3 >= k. On exit, stores a sequence of ib-by-ib upper triangular T matrices representing the block Householder reflectors. See Further Details.

Further Details

Let A be the (k + m)-by-k matrix

A = [ A1 ]  <== k-by-k upper triangular
    [ A2 ]  <== m-by-k upper pentagonal

For all cases, A1 is upper triangular. Example with k = 4, with . representing non-zeros.

A1 = [ . . . . ]  <== k-by-k upper triangular
     [   . . . ]
     [     . . ]
     [       . ]
     [- - - - -]
     [         ]  <== if k1 > k, all-zero rows are ignored

Depending on m, k, l, there are several cases for A2. Currently, SLATE supports only cases 1, 2, and 3. It assumes m = min( A2.mb, A2.nb ), and l = m or l = 0.

Case 1: m = k = l, A2 is upper triangular. Example with m = k = l = 4.

A2 = [ . . . . ]  <== l-by-k upper triangular
     [   . . . ]
     [     . . ]
     [       . ]
     [- - - - -]
     [         ]  <== if m2 > m, all-zero rows are ignored

Case 2: m < k and l = min( m, k ) = m, A2 is upper trapezoidal (wide). Example with m = l = 3, k = 4.

A2 = [ . . . . ]  <== l-by-k upper trapezoidal
     [   . . . ]
     [     . . ]

Case 3: l = 0, A2 is just the rectangular portion. Currently unused in SLATE, but should work. Example with m = 3, l = 0, k = 4.

A2 = [ . . . . ]  <== m-by-k rectangular
     [ . . . . ]
     [ . . . . ]

Case 4: m > k and l = k, A2 is upper trapezoidal (tall). Currently unsupported in SLATE; would require explicitly passing m. Example with m = 6, l = k = 4.

A2 = [ . . . . ]  <== (m - l)-by-k rectangular
     [ . . . . ]
     [---------]
     [ . . . . ]  <== l-by-k upper triangular
     [   . . . ]
     [     . . ]
     [       . ]

Case 5: 0 < l < min( m, k ), A2 is upper pentagonal. Currently unsupported in SLATE; would require explicitly passing m. Example with m = 5, l = 3, k = 4.

A2 = [ . . . . ]  <== (m - l)-by-k rectangular
     [ . . . . ]
     [---------]
     [ . . . . ]  <== l-by-k upper trapezoidal
     [   . . . ]
     [     . . ]

After factoring, the vector representing the elementary reflector H(i) is in the i-th column of the (m + k)-by-k matrix V:

V = [ I  ] <== k-by-k identity
    [ V2 ] <== m-by-k pentagonal, same form as A2.

Thus, all of the information needed for V is contained in V2, which has the same form as A2 and overwrites A2 on exit.

The number of blocks is r = ceiling( k/ib ), where each block is of order ib except for the last block, which is of order rb = k - (r-1)*ib. For each of the r blocks, an upper triangular block reflector factor is computed: T1, T2, ..., Tr. The ib-by-ib (and rb-by-rb for the last block) T's are stored in the ib-by-k matrix T as

T = [ T1 T2 ... Tr ]

Note: compared to LAPACK, A is renamed here => A1, B => A2, W => V, V => V2, and n => k. This makes k match k in tpmqrt.

Functions

Detailed Description

Function Documentation

◆ geqrf()

◆ householder_reflection_generator()

◆ tpmqrt()

◆ tpqrt()

Further Details