SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches
gemm: General matrix multiply

Functions

template<Target target = Target::HostTask, typename scalar_t >
void slate::internal::gemm (scalar_t alpha, Matrix< scalar_t > &&A, Matrix< scalar_t > &&B, scalar_t beta, Matrix< scalar_t > &&C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
 
template<typename scalar_t >
void slate::internal::gemm (internal::TargetType< Target::HostTask >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
 
template<typename scalar_t >
void slate::internal::gemm (internal::TargetType< Target::HostNest >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
 
template<typename scalar_t >
void slate::internal::gemm (internal::TargetType< Target::HostBatch >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply to update trailing matrix, where A is a single block col (mt tiles by 1 tile) and B is a single block row (1 tile by nt tiles) and C is mt tiles by nt tiles.
 
template<typename scalar_t >
void slate::internal::gemm (internal::TargetType< Target::Devices >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.
 
template<Target target = Target::HostTask, typename scalar_t >
void slate::internal::gemmA (scalar_t alpha, Matrix< scalar_t > &&A, Matrix< scalar_t > &&B, scalar_t beta, Matrix< scalar_t > &&C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply for a left-looking update.
 
template<typename scalar_t >
void slate::internal::gemmA (internal::TargetType< Target::HostTask >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int queue_index)
 General matrix multiply for a left-looking update.
 
template<typename scalar_t >
void slate::internal::gemmA (internal::TargetType< Target::Devices >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
 General matrix multiply for a left-looking update where TODO GPU device batched-BLAS implementation.
 

Detailed Description

Function Documentation

◆ gemm() [1/5]

template<typename scalar_t >
void slate::internal::gemm ( internal::TargetType< Target::Devices ,
scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

GPU device batched-BLAS implementation. GPU can use either ColMajor or RowMajor.

◆ gemm() [2/5]

template<typename scalar_t >
void slate::internal::gemm ( internal::TargetType< Target::HostBatch ,
scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply to update trailing matrix, where A is a single block col (mt tiles by 1 tile) and B is a single block row (1 tile by nt tiles) and C is mt tiles by nt tiles.

Host batched implementation.

◆ gemm() [3/5]

template<typename scalar_t >
void slate::internal::gemm ( internal::TargetType< Target::HostNest ,
scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

Host nested OpenMP implementation.

◆ gemm() [4/5]

template<typename scalar_t >
void slate::internal::gemm ( internal::TargetType< Target::HostTask ,
scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

Host OpenMP task implementation.

◆ gemm() [5/5]

template<Target target = Target::HostTask, typename scalar_t >
void slate::internal::gemm ( scalar_t  alpha,
Matrix< scalar_t > &&  A,
Matrix< scalar_t > &&  B,
scalar_t  beta,
Matrix< scalar_t > &&  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

Dispatches to target implementations. In the complex case, if \(op(C)\) is transpose, then \(op(A)\) and \(op(B)\) cannot be conj_transpose; if \(op(C)\) is conj_transpose, then \(op(A)\) and \(op(B)\) cannot be transpose.

Parameters
[in]layoutIndicates the Layout (ColMajor/RowMajor) to operate with. Local tiles of matrix C and corresponding tiles of A & B on target devices will be converted to layout.

◆ gemmA() [1/3]

template<typename scalar_t >
void slate::internal::gemmA ( internal::TargetType< Target::Devices ,
scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply for a left-looking update where TODO GPU device batched-BLAS implementation.

GPU can use either ColMajor or RowMajor.

◆ gemmA() [2/3]

template<typename scalar_t >
void slate::internal::gemmA ( internal::TargetType< Target::HostTask ,
scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Layout  layout,
int  priority,
int  queue_index 
)

General matrix multiply for a left-looking update.

This routine consists of two steps: the management of the memory and the actual computation. Note that this routine may insert some tiles that must be erased later. The original intent was to erase them when calling the ListReduce routine. First step: It iterates over the tiles of A and gets the needed tiles of B and C. In the case where the corresponding tiles of C do not exist, it acquires them. It means these tiles are created and inserted as workspace. It is expected that they are erased either by the calling routine or through the call to ListReduce. Second step: As soon as the data is ready, the routine calls gemm. However, the beta is used once and only in the case where the current tile of C existed prior the call to this routine. Otherwise, the beta value is replaced by zero. In any case, the internal value beta_j is set to one to allow the accumulation in the tile. Host OpenMP task implementation.

◆ gemmA() [3/3]

template<Target target = Target::HostTask, typename scalar_t >
void slate::internal::gemmA ( scalar_t  alpha,
Matrix< scalar_t > &&  A,
Matrix< scalar_t > &&  B,
scalar_t  beta,
Matrix< scalar_t > &&  C,
Layout  layout,
int  priority,
int64_t  queue_index 
)

General matrix multiply for a left-looking update.

Does computation where the A tiles are local. If needed (e.g., not local), inserts C tiles and sets beta to 0. On output, partial products Cik exist distributed wherever Aij is local, for all i = 0 : A.mt, j = 0 : A.nt, k=. Dispatches to target implementations. In the complex case, if \(op(C)\) is transpose, then \(op(A)\) and \(op(B)\) cannot be conj_transpose; if \(op(C)\) is conj_transpose, then \(op(A)\) and \(op(B)\) cannot be transpose.