\(C = \alpha A B + \beta C\) More...

Functions
template<typename scalar_t >
void	slate::gemm (scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Options const &opts)
	Distributed parallel general matrix-matrix multiplication.

template<typename scalar_t >
void	slate::gemmA (scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Options const &opts)
	Distributed parallel general matrix-matrix multiplication.

template<typename scalar_t >
void	slate::gemmC (scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Options const &opts)
	Distributed parallel general matrix-matrix multiplication.

Detailed Description

\(C = \alpha A B + \beta C\)

Function Documentation

◆ gemm()

template<typename scalar_t >

void slate::gemm	(	scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Options const &	opts
	)

Distributed parallel general matrix-matrix multiplication.

Performs the matrix-matrix operation

\[ C = \alpha A B + \beta C, \]

where alpha and beta are scalars, and \(A\), \(B\), and \(C\) are matrices, with \(A\) an m-by-k matrix, \(B\) a k-by-n matrix, and \(C\) an m-by-n matrix. The matrices can be transposed or conjugate-transposed beforehand, e.g.,

auto AT = slate::transpose( A );
auto BT = slate::conj_transpose( B );
slate::gemm( alpha, AT, BT, beta, C );

Complexity (in real): \(2 m n k\) flops.

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	alpha	The scalar alpha.
[in]	A	The m-by-k matrix A.
[in]	B	The k-by-n matrix B.
[in]	beta	The scalar beta.
[in,out]	C	On entry, the m-by-n matrix C. On exit, overwritten by the result \(\alpha A B + \beta C\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of blocks to overlap communication and computation. lookahead >= 0. Default 1. Option::MethodGemm: Select the right routine to call. Possible values: Auto: let the routine decides [default] gemmA: select gemmA routine gemmC: select gemmC routine Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

◆ gemmA()

template<typename scalar_t >

void slate::gemmA	(	scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Options const &	opts
	)

Distributed parallel general matrix-matrix multiplication.

Performs the matrix-matrix operation

\[ C = \alpha A B + \beta C, \]

where alpha and beta are scalars, and \(A\), \(B\), and \(C\) are matrices, with \(A\) an m-by-k matrix, \(B\) a k-by-n matrix, and \(C\) an m-by-n matrix. The matrices can be transposed or conjugate-transposed beforehand, e.g.,

auto AT = slate::transpose( A );
auto BT = slate::conj_transpose( B );
slate::gemm( alpha, AT, BT, beta, C );

This algorithmic variant manages computation to be local to the location of the A matrix. This can be useful if size(A) >> size(B), size(C).

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	alpha	The scalar alpha.
[in]	A	The m-by-k matrix A.
[in]	B	The k-by-n matrix B.
[in]	beta	The scalar beta.
[in,out]	C	On entry, the m-by-n matrix C. On exit, overwritten by the result \(\alpha A B + \beta C\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of blocks to overlap communication and computation. lookahead >= 0. Default 1. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

◆ gemmC()

template<typename scalar_t >

void slate::gemmC	(	scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Options const &	opts
	)

Distributed parallel general matrix-matrix multiplication.

Performs the matrix-matrix operation

\[ C = \alpha A B + \beta C, \]

where alpha and beta are scalars, and \(A\), \(B\), and \(C\) are matrices, with \(A\) an m-by-k matrix, \(B\) a k-by-n matrix, and \(C\) an m-by-n matrix. The matrices can be transposed or conjugate-transposed beforehand, e.g.,

auto AT = slate::transpose( A );
auto BT = slate::conj_transpose( B );
slate::gemmC( alpha, AT, BT, beta, C );

Template Parameters

scalar_t One of float, double, std::complex<float>, std::complex<double>.

Parameters

[in]	alpha	The scalar alpha.
[in]	A	The m-by-k matrix A.
[in]	B	The k-by-n matrix B.
[in]	beta	The scalar beta.
[in,out]	C	On entry, the m-by-n matrix C. On exit, overwritten by the result \(\alpha A B + \beta C\).
[in]	opts	Additional options, as map of name = value pairs. Possible options: Option::Lookahead: Number of blocks to overlap communication and computation. lookahead >= 0. Default 1. Option::Target: Implementation to target. Possible values: HostTask: OpenMP tasks on CPU host [default]. HostNest: nested OpenMP parallel for loop on CPU host. HostBatch: batched BLAS on CPU host. Devices: batched BLAS on GPU device.

Functions

Detailed Description

Function Documentation

◆ gemm()

◆ gemmA()

◆ gemmC()