SLATE 2024.05.31
Software for Linear Algebra Targeting Exascale
Loading...
Searching...
No Matches
gemm: General matrix multiply

\(C = \alpha A B + \beta C\) More...

Functions

template<typename scalar_t >
void slate::gemm (scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Options const &opts)
 Distributed parallel general matrix-matrix multiplication.
 
template<typename scalar_t >
void slate::gemmA (scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Options const &opts)
 Distributed parallel general matrix-matrix multiplication.
 
template<typename scalar_t >
void slate::gemmC (scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Options const &opts)
 Distributed parallel general matrix-matrix multiplication.
 

Detailed Description

\(C = \alpha A B + \beta C\)

Function Documentation

◆ gemm()

template<typename scalar_t >
void slate::gemm ( scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Options const &  opts 
)

Distributed parallel general matrix-matrix multiplication.

Performs the matrix-matrix operation

\[ C = \alpha A B + \beta C, \]

where alpha and beta are scalars, and \(A\), \(B\), and \(C\) are matrices, with \(A\) an m-by-k matrix, \(B\) a k-by-n matrix, and \(C\) an m-by-n matrix. The matrices can be transposed or conjugate-transposed beforehand, e.g.,

auto AT = slate::transpose( A );
auto BT = slate::conj_transpose( B );
slate::gemm( alpha, AT, BT, beta, C );

Complexity (in real): \(2 m n k\) flops.

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]alphaThe scalar alpha.
[in]AThe m-by-k matrix A.
[in]BThe k-by-n matrix B.
[in]betaThe scalar beta.
[in,out]COn entry, the m-by-n matrix C. On exit, overwritten by the result \(\alpha A B + \beta C\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of blocks to overlap communication and computation. lookahead >= 0. Default 1.
  • Option::MethodGemm: Select the right routine to call. Possible values:
    • Auto: let the routine decides [default]
    • gemmA: select gemmA routine
    • gemmC: select gemmC routine
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ gemmA()

template<typename scalar_t >
void slate::gemmA ( scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Options const &  opts 
)

Distributed parallel general matrix-matrix multiplication.

Performs the matrix-matrix operation

\[ C = \alpha A B + \beta C, \]

where alpha and beta are scalars, and \(A\), \(B\), and \(C\) are matrices, with \(A\) an m-by-k matrix, \(B\) a k-by-n matrix, and \(C\) an m-by-n matrix. The matrices can be transposed or conjugate-transposed beforehand, e.g.,

auto AT = slate::transpose( A );
auto BT = slate::conj_transpose( B );
slate::gemm( alpha, AT, BT, beta, C );

This algorithmic variant manages computation to be local to the location of the A matrix. This can be useful if size(A) >> size(B), size(C).

Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]alphaThe scalar alpha.
[in]AThe m-by-k matrix A.
[in]BThe k-by-n matrix B.
[in]betaThe scalar beta.
[in,out]COn entry, the m-by-n matrix C. On exit, overwritten by the result \(\alpha A B + \beta C\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of blocks to overlap communication and computation. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.

◆ gemmC()

template<typename scalar_t >
void slate::gemmC ( scalar_t  alpha,
Matrix< scalar_t > &  A,
Matrix< scalar_t > &  B,
scalar_t  beta,
Matrix< scalar_t > &  C,
Options const &  opts 
)

Distributed parallel general matrix-matrix multiplication.

Performs the matrix-matrix operation

\[ C = \alpha A B + \beta C, \]

where alpha and beta are scalars, and \(A\), \(B\), and \(C\) are matrices, with \(A\) an m-by-k matrix, \(B\) a k-by-n matrix, and \(C\) an m-by-n matrix. The matrices can be transposed or conjugate-transposed beforehand, e.g.,

auto AT = slate::transpose( A );
auto BT = slate::conj_transpose( B );
slate::gemmC( alpha, AT, BT, beta, C );
Template Parameters
scalar_tOne of float, double, std::complex<float>, std::complex<double>.
Parameters
[in]alphaThe scalar alpha.
[in]AThe m-by-k matrix A.
[in]BThe k-by-n matrix B.
[in]betaThe scalar beta.
[in,out]COn entry, the m-by-n matrix C. On exit, overwritten by the result \(\alpha A B + \beta C\).
[in]optsAdditional options, as map of name = value pairs. Possible options:
  • Option::Lookahead: Number of blocks to overlap communication and computation. lookahead >= 0. Default 1.
  • Option::Target: Implementation to target. Possible values:
    • HostTask: OpenMP tasks on CPU host [default].
    • HostNest: nested OpenMP parallel for loop on CPU host.
    • HostBatch: batched BLAS on CPU host.
    • Devices: batched BLAS on GPU device.