SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches

Functions

template<typename scalar_t >
void slate::tile::geqrf (int64_t ib, std::vector< Tile< scalar_t > > &tiles, std::vector< int64_t > &tile_indices, Tile< scalar_t > &T, int thread_rank, int thread_size, ThreadBarrier &thread_barrier, std::vector< blas::real_type< scalar_t > > &scale, std::vector< blas::real_type< scalar_t > > &sumsq, blas::real_type< scalar_t > &xnorm, std::vector< std::vector< scalar_t > > &W)
 Compute the QR factorization of a panel.
 
template<typename scalar_t >
void slate::tile::householder_reflection_generator (std::vector< Tile< scalar_t > > &tiles, std::vector< int64_t > &tile_indices, Tile< scalar_t > &T, int thread_rank, int thread_size, ThreadBarrier &thread_barrier, std::vector< blas::real_type< scalar_t > > &scale, std::vector< blas::real_type< scalar_t > > &sumsq, blas::real_type< scalar_t > &xnorm, std::vector< std::vector< scalar_t > > &W)
 Compute at the qr2 level, Householder reflections of a panel, with tau scalars.
 
template<typename scalar_t >
void slate::tile::tpmqrt (Side side, Op op, int64_t l, Tile< scalar_t > V2, Tile< scalar_t > T, Tile< scalar_t > C1, Tile< scalar_t > C2)
 Multiply the matrix C by the unitary matrix Q obtained from a "triangular-pentagonal" block reflector H.
 
template<typename scalar_t >
void slate::tile::tpqrt (int64_t l, Tile< scalar_t > A1, Tile< scalar_t > A2, Tile< scalar_t > T)
 Compute the triangular-pentagonal QR factorization of 2 tiles, A1 and A2.
 

Detailed Description

Function Documentation

◆ geqrf()

template<typename scalar_t >
void slate::tile::geqrf ( int64_t  ib,
std::vector< Tile< scalar_t > > &  tiles,
std::vector< int64_t > &  tile_indices,
Tile< scalar_t > &  T,
int  thread_rank,
int  thread_size,
ThreadBarrier &  thread_barrier,
std::vector< blas::real_type< scalar_t > > &  scale,
std::vector< blas::real_type< scalar_t > > &  sumsq,
blas::real_type< scalar_t > &  xnorm,
std::vector< std::vector< scalar_t > > &  W 
)

Compute the QR factorization of a panel.

Parameters
[in]ibinternal blocking in the panel
[in,out]tileslocal tiles in the panel
[in]tile_indicesi indices of the tiles in the panel
[out]Tupper triangular factor of the block reflector
[in]thread_rankrank of this thread
[in]thread_sizenumber of local threads
[in]thread_barrierbarrier for synchronizing local threads
[out]scalecomponent of lassq to compute Householder norm in a stable manner
[out]sumsqcomponent of lassq to compute Householder norm in a stable manner
[out]xnormHouseholder norm
[out]W

◆ householder_reflection_generator()

template<typename scalar_t >
void slate::tile::householder_reflection_generator ( std::vector< Tile< scalar_t > > &  tiles,
std::vector< int64_t > &  tile_indices,
Tile< scalar_t > &  T,
int  thread_rank,
int  thread_size,
ThreadBarrier &  thread_barrier,
std::vector< blas::real_type< scalar_t > > &  scale,
std::vector< blas::real_type< scalar_t > > &  sumsq,
blas::real_type< scalar_t > &  xnorm,
std::vector< std::vector< scalar_t > > &  W 
)

Compute at the qr2 level, Householder reflections of a panel, with tau scalars.

Additionally, one can: Compute blocking factor T column-by-column Compute trailing matrix update at qr2 level (for a full factorization) by manually changing #ifdef specific to desired computation We care for two reasons: 1) Performance comparison, evaluate panel performance without trailing matrix update and/or constructing T 2) As a random orthogonal matrix generator code (no need to update)

Parameters
[in,out]tileslocal tiles in the panel
[in]tile_indicesi indices of the tiles in the panel
[out]Tupper triangular factor of the block reflector
[in]thread_rankrank of this thread
[in]thread_sizenumber of local threads
[in]thread_barrierbarrier for synchronizing local threads

todo: add missing params

◆ tpmqrt()

template<typename scalar_t >
void slate::tile::tpmqrt ( Side  side,
Op  op,
int64_t  l,
Tile< scalar_t >  V2,
Tile< scalar_t >  T,
Tile< scalar_t >  C1,
Tile< scalar_t >  C2 
)

Multiply the matrix C by the unitary matrix Q obtained from a "triangular-pentagonal" block reflector H.

C consists of two tiles, C1 and C2.

If side == Left:

[ I  ] <== k-by-k    C = [ C1 ]  <== k-by-n
[ V2 ] <== m-by-k        [ C2 ]  <== m-by-n

and on exit, \(C = op(Q) C\). l, k, m are the same in tpqrt; n here is different.

If side == Right:

C = [ C1  C2 ]      [ I  ] <== k-by-k
  m-by-k  m-by-n    [ V2 ] <== n-by-k

and on exit, \(C = C op(Q)\). l, k are the same in tpqrt; n = tpqrt's m; m here is different.

Q is a product of block reflectors,

Q = \prod_{j = 1, ..., r} I - Vj Tj Vj^H

where r is the number of blocks, Tj is the j-th block of T, and Vj is the j-th block column of V, with internal blocking size ib.

See Further Details in tpqrt.

Parameters
[in]side
  • Side::Left: Multiply from the left: \(C = op(Q) C\).
  • Side::Right: Multiply from the right: \(C = C op(Q)\).
[in]op
  • Op::NoTrans: Multiply by \(op(Q) = Q\).
  • Op::Trans: Multiply by \(op(Q) = Q^T\) (only in real case).
  • Op::ConjTrans: Multiply by \(op(Q) = Q^H\).
[in]lThe number of rows of the upper trapezoidal part of V2.
  • If side = left, min( m, k ) >= l >= 0.
  • If side = right, min( n, k ) >= l >= 0.
[in]V2The M-by-k, upper pentagonal matrix V2, in an M2-by-k tile, where M2 >= M.
  • If side == Left, M = m.
  • If side == Right, M = n. The i-th column must contain the vector which defines the elementary reflector H(i), for i = 1, 2, ..., k, as returned by tpqrt in A2. The top (M-l)-by-k portion is rectangular, the bottom l-by-k portion is upper trapezoidal. See Further Details in tpqrt.
[in]TThe upper triangular factors of the block reflectors as returned by tpqrt, stored as an ib-by-k tile.
[in,out]C1
  • If side == Left, the k-by-n matrix C1, in a k1-by-n tile, k1 >= k.
  • If side == Right, the m-by-k matrix C1, in an m-by-k1 tile, k1 >= k. On exit, C1 is overwritten by the corresponding block of \(op(Q) C\) or \(C op(Q)\).
[in,out]C2The m-by-n matrix C2.
  • If side == Left, C2 is in an m2-by-n tile, where m2 >= m.
  • If side == Right, C2 is in an m-by-n2 tile, where n2 >= n. On exit, C2 is overwritten by the corresponding block of \(op(Q) C\) or \(C op(Q)\).

Note: compared to LAPACK, A is renamed here => C1, B => C2, V => V2.

◆ tpqrt()

template<typename scalar_t >
void slate::tile::tpqrt ( int64_t  l,
Tile< scalar_t >  A1,
Tile< scalar_t >  A2,
Tile< scalar_t >  T 
)

Compute the triangular-pentagonal QR factorization of 2 tiles, A1 and A2.

On exit, the pentagonal tile A2 has been eliminated.

Parameters
[in]lThe number of rows of the upper trapezoidal part of A2. min( m, k ) >= l >= 0. See Further Details.
[in,out]A1On entry, the k-by-k upper triangular matrix A1, in a k1-by-k tile, where k1 >= k.
[in,out]A2On entry, the m-by-k upper pentagonal matrix A2, in an m2-by-k tile, where m2 >= m. On exit, the columns represent the Householder reflectors. The top (m-l)-by-k portion is rectangular, the bottom l-by-k portion is upper trapezoidal.
[out]TArray of size ib-by-k, where ib is the internal blocking to use, 1 <= ib <= k, in an ib3-by-k3 tile, ib3 >= ib and k3 >= k. On exit, stores a sequence of ib-by-ib upper triangular T matrices representing the block Householder reflectors. See Further Details.

Further Details

Let A be the (k + m)-by-k matrix

A = [ A1 ]  <== k-by-k upper triangular
    [ A2 ]  <== m-by-k upper pentagonal

For all cases, A1 is upper triangular. Example with k = 4, with . representing non-zeros.

A1 = [ . . . . ]  <== k-by-k upper triangular
     [   . . . ]
     [     . . ]
     [       . ]
     [- - - - -]
     [         ]  <== if k1 > k, all-zero rows are ignored

Depending on m, k, l, there are several cases for A2. Currently, SLATE supports only cases 1, 2, and 3. It assumes m = min( A2.mb, A2.nb ), and l = m or l = 0.

Case 1: m = k = l, A2 is upper triangular. Example with m = k = l = 4.

A2 = [ . . . . ]  <== l-by-k upper triangular
     [   . . . ]
     [     . . ]
     [       . ]
     [- - - - -]
     [         ]  <== if m2 > m, all-zero rows are ignored

Case 2: m < k and l = min( m, k ) = m, A2 is upper trapezoidal (wide). Example with m = l = 3, k = 4.

A2 = [ . . . . ]  <== l-by-k upper trapezoidal
     [   . . . ]
     [     . . ]

Case 3: l = 0, A2 is just the rectangular portion. Currently unused in SLATE, but should work. Example with m = 3, l = 0, k = 4.

A2 = [ . . . . ]  <== m-by-k rectangular
     [ . . . . ]
     [ . . . . ]

Case 4: m > k and l = k, A2 is upper trapezoidal (tall). Currently unsupported in SLATE; would require explicitly passing m. Example with m = 6, l = k = 4.

A2 = [ . . . . ]  <== (m - l)-by-k rectangular
     [ . . . . ]
     [---------]
     [ . . . . ]  <== l-by-k upper triangular
     [   . . . ]
     [     . . ]
     [       . ]

Case 5: 0 < l < min( m, k ), A2 is upper pentagonal. Currently unsupported in SLATE; would require explicitly passing m. Example with m = 5, l = 3, k = 4.

A2 = [ . . . . ]  <== (m - l)-by-k rectangular
     [ . . . . ]
     [---------]
     [ . . . . ]  <== l-by-k upper trapezoidal
     [   . . . ]
     [     . . ]

After factoring, the vector representing the elementary reflector H(i) is in the i-th column of the (m + k)-by-k matrix V:

V = [ I  ] <== k-by-k identity
    [ V2 ] <== m-by-k pentagonal, same form as A2.

Thus, all of the information needed for V is contained in V2, which has the same form as A2 and overwrites A2 on exit.

The number of blocks is r = ceiling( k/ib ), where each block is of order ib except for the last block, which is of order rb = k - (r-1)*ib. For each of the r blocks, an upper triangular block reflector factor is computed: T1, T2, ..., Tr. The ib-by-ib (and rb-by-rb for the last block) T's are stored in the ib-by-k matrix T as

T = [ T1 T2 ... Tr ]

Note: compared to LAPACK, A is renamed here => A1, B => A2, W => V, V => V2, and n => k. This makes k match k in tpmqrt.