SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
|
Functions | |
template<Target target = Target::HostTask, typename scalar_t > | |
void | slate::internal::gemm (scalar_t alpha, Matrix< scalar_t > &&A, Matrix< scalar_t > &&B, scalar_t beta, Matrix< scalar_t > &&C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row. | |
template<typename scalar_t > | |
void | slate::internal::gemm (internal::TargetType< Target::HostTask >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row. | |
template<typename scalar_t > | |
void | slate::internal::gemm (internal::TargetType< Target::HostNest >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row. | |
template<typename scalar_t > | |
void | slate::internal::gemm (internal::TargetType< Target::HostBatch >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply to update trailing matrix, where A is a single block col (mt tiles by 1 tile) and B is a single block row (1 tile by nt tiles) and C is mt tiles by nt tiles. | |
template<typename scalar_t > | |
void | slate::internal::gemm (internal::TargetType< Target::Devices >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row. | |
template<Target target = Target::HostTask, typename scalar_t > | |
void | slate::internal::gemmA (scalar_t alpha, Matrix< scalar_t > &&A, Matrix< scalar_t > &&B, scalar_t beta, Matrix< scalar_t > &&C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply for a left-looking update. | |
template<typename scalar_t > | |
void | slate::internal::gemmA (internal::TargetType< Target::HostTask >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int queue_index) |
General matrix multiply for a left-looking update. | |
template<typename scalar_t > | |
void | slate::internal::gemmA (internal::TargetType< Target::Devices >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index) |
General matrix multiply for a left-looking update where TODO GPU device batched-BLAS implementation. | |
void slate::internal::gemm | ( | internal::TargetType< Target::Devices > | , |
scalar_t | alpha, | ||
Matrix< scalar_t > & | A, | ||
Matrix< scalar_t > & | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > & | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
GPU device batched-BLAS implementation. GPU can use either ColMajor or RowMajor.
void slate::internal::gemm | ( | internal::TargetType< Target::HostBatch > | , |
scalar_t | alpha, | ||
Matrix< scalar_t > & | A, | ||
Matrix< scalar_t > & | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > & | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply to update trailing matrix, where A is a single block col (mt tiles by 1 tile) and B is a single block row (1 tile by nt tiles) and C is mt tiles by nt tiles.
Host batched implementation.
void slate::internal::gemm | ( | internal::TargetType< Target::HostNest > | , |
scalar_t | alpha, | ||
Matrix< scalar_t > & | A, | ||
Matrix< scalar_t > & | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > & | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
Host nested OpenMP implementation.
void slate::internal::gemm | ( | internal::TargetType< Target::HostTask > | , |
scalar_t | alpha, | ||
Matrix< scalar_t > & | A, | ||
Matrix< scalar_t > & | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > & | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
Host OpenMP task implementation.
void slate::internal::gemm | ( | scalar_t | alpha, |
Matrix< scalar_t > && | A, | ||
Matrix< scalar_t > && | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > && | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
Dispatches to target implementations. In the complex case, if \(op(C)\) is transpose, then \(op(A)\) and \(op(B)\) cannot be conj_transpose; if \(op(C)\) is conj_transpose, then \(op(A)\) and \(op(B)\) cannot be transpose.
[in] | layout | Indicates the Layout (ColMajor/RowMajor) to operate with. Local tiles of matrix C and corresponding tiles of A & B on target devices will be converted to layout. |
void slate::internal::gemmA | ( | internal::TargetType< Target::Devices > | , |
scalar_t | alpha, | ||
Matrix< scalar_t > & | A, | ||
Matrix< scalar_t > & | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > & | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply for a left-looking update where TODO GPU device batched-BLAS implementation.
GPU can use either ColMajor or RowMajor.
void slate::internal::gemmA | ( | internal::TargetType< Target::HostTask > | , |
scalar_t | alpha, | ||
Matrix< scalar_t > & | A, | ||
Matrix< scalar_t > & | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > & | C, | ||
Layout | layout, | ||
int | priority, | ||
int | queue_index | ||
) |
General matrix multiply for a left-looking update.
This routine consists of two steps: the management of the memory and the actual computation. Note that this routine may insert some tiles that must be erased later. The original intent was to erase them when calling the ListReduce routine. First step: It iterates over the tiles of A and gets the needed tiles of B and C. In the case where the corresponding tiles of C do not exist, it acquires them. It means these tiles are created and inserted as workspace. It is expected that they are erased either by the calling routine or through the call to ListReduce. Second step: As soon as the data is ready, the routine calls gemm. However, the beta is used once and only in the case where the current tile of C existed prior the call to this routine. Otherwise, the beta value is replaced by zero. In any case, the internal value beta_j is set to one to allow the accumulation in the tile. Host OpenMP task implementation.
void slate::internal::gemmA | ( | scalar_t | alpha, |
Matrix< scalar_t > && | A, | ||
Matrix< scalar_t > && | B, | ||
scalar_t | beta, | ||
Matrix< scalar_t > && | C, | ||
Layout | layout, | ||
int | priority, | ||
int64_t | queue_index | ||
) |
General matrix multiply for a left-looking update.
Does computation where the A tiles are local. If needed (e.g., not local), inserts C tiles and sets beta to 0. On output, partial products Cik exist distributed wherever Aij is local, for all i = 0 : A.mt, j = 0 : A.nt, k=. Dispatches to target implementations. In the complex case, if \(op(C)\) is transpose, then \(op(A)\) and \(op(B)\) cannot be conj_transpose; if \(op(C)\) is conj_transpose, then \(op(A)\) and \(op(B)\) cannot be transpose.