Functions
template<Target target = Target::HostTask, typename scalar_t >
void	slate::internal::gemm (scalar_t alpha, Matrix< scalar_t > &&A, Matrix< scalar_t > &&B, scalar_t beta, Matrix< scalar_t > &&C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

template<typename scalar_t >
void	slate::internal::gemm (internal::TargetType< Target::HostTask >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

template<typename scalar_t >
void	slate::internal::gemm (internal::TargetType< Target::HostNest >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

template<typename scalar_t >
void	slate::internal::gemm (internal::TargetType< Target::HostBatch >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply to update trailing matrix, where A is a single block col (mt tiles by 1 tile) and B is a single block row (1 tile by nt tiles) and C is mt tiles by nt tiles.

template<typename scalar_t >
void	slate::internal::gemm (internal::TargetType< Target::Devices >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

template<Target target = Target::HostTask, typename scalar_t >
void	slate::internal::gemmA (scalar_t alpha, Matrix< scalar_t > &&A, Matrix< scalar_t > &&B, scalar_t beta, Matrix< scalar_t > &&C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply for a left-looking update.

template<typename scalar_t >
void	slate::internal::gemmA (internal::TargetType< Target::HostTask >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int queue_index)
	General matrix multiply for a left-looking update.

template<typename scalar_t >
void	slate::internal::gemmA (internal::TargetType< Target::Devices >, scalar_t alpha, Matrix< scalar_t > &A, Matrix< scalar_t > &B, scalar_t beta, Matrix< scalar_t > &C, Layout layout, int priority, int64_t queue_index)
	General matrix multiply for a left-looking update where TODO GPU device batched-BLAS implementation.

Detailed Description

Function Documentation

◆ gemm() [1/5]

template<typename scalar_t >

void slate::internal::gemm	(	internal::TargetType< Target::Devices >	,
		scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

GPU device batched-BLAS implementation. GPU can use either ColMajor or RowMajor.

◆ gemm() [2/5]

template<typename scalar_t >

void slate::internal::gemm	(	internal::TargetType< Target::HostBatch >	,
		scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply to update trailing matrix, where A is a single block col (mt tiles by 1 tile) and B is a single block row (1 tile by nt tiles) and C is mt tiles by nt tiles.

Host batched implementation.

◆ gemm() [3/5]

template<typename scalar_t >

void slate::internal::gemm	(	internal::TargetType< Target::HostNest >	,
		scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

Host nested OpenMP implementation.

◆ gemm() [4/5]

template<typename scalar_t >

void slate::internal::gemm	(	internal::TargetType< Target::HostTask >	,
		scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

Host OpenMP task implementation.

◆ gemm() [5/5]

template<Target target = Target::HostTask, typename scalar_t >

void slate::internal::gemm	(	scalar_t	alpha,
		Matrix< scalar_t > &&	A,
		Matrix< scalar_t > &&	B,
		scalar_t	beta,
		Matrix< scalar_t > &&	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply to update trailing matrix, where A is a single block column and B is a single block row.

Dispatches to target implementations. In the complex case, if \(op(C)\) is transpose, then \(op(A)\) and \(op(B)\) cannot be conj_transpose; if \(op(C)\) is conj_transpose, then \(op(A)\) and \(op(B)\) cannot be transpose.

Parameters

[in] layout Indicates the Layout (ColMajor/RowMajor) to operate with. Local tiles of matrix C and corresponding tiles of A & B on target devices will be converted to layout.

◆ gemmA() [1/3]

template<typename scalar_t >

void slate::internal::gemmA	(	internal::TargetType< Target::Devices >	,
		scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply for a left-looking update where TODO GPU device batched-BLAS implementation.

GPU can use either ColMajor or RowMajor.

◆ gemmA() [2/3]

template<typename scalar_t >

void slate::internal::gemmA	(	internal::TargetType< Target::HostTask >	,
		scalar_t	alpha,
		Matrix< scalar_t > &	A,
		Matrix< scalar_t > &	B,
		scalar_t	beta,
		Matrix< scalar_t > &	C,
		Layout	layout,
		int	priority,
		int	queue_index
	)

General matrix multiply for a left-looking update.

This routine consists of two steps: the management of the memory and the actual computation. Note that this routine may insert some tiles that must be erased later. The original intent was to erase them when calling the ListReduce routine. First step: It iterates over the tiles of A and gets the needed tiles of B and C. In the case where the corresponding tiles of C do not exist, it acquires them. It means these tiles are created and inserted as workspace. It is expected that they are erased either by the calling routine or through the call to ListReduce. Second step: As soon as the data is ready, the routine calls gemm. However, the beta is used once and only in the case where the current tile of C existed prior the call to this routine. Otherwise, the beta value is replaced by zero. In any case, the internal value beta_j is set to one to allow the accumulation in the tile. Host OpenMP task implementation.

◆ gemmA() [3/3]

template<Target target = Target::HostTask, typename scalar_t >

void slate::internal::gemmA	(	scalar_t	alpha,
		Matrix< scalar_t > &&	A,
		Matrix< scalar_t > &&	B,
		scalar_t	beta,
		Matrix< scalar_t > &&	C,
		Layout	layout,
		int	priority,
		int64_t	queue_index
	)

General matrix multiply for a left-looking update.

Does computation where the A tiles are local. If needed (e.g., not local), inserts C tiles and sets beta to 0. On output, partial products Cik exist distributed wherever Aij is local, for all i = 0 : A.mt, j = 0 : A.nt, k=. Dispatches to target implementations. In the complex case, if \(op(C)\) is transpose, then \(op(A)\) and \(op(B)\) cannot be conj_transpose; if \(op(C)\) is conj_transpose, then \(op(A)\) and \(op(B)\) cannot be transpose.

Functions

Detailed Description

Function Documentation

◆ gemm() [1/5]

◆ gemm() [2/5]

◆ gemm() [3/5]

◆ gemm() [4/5]

◆ gemm() [5/5]

◆ gemmA() [1/3]

◆ gemmA() [2/3]

◆ gemmA() [3/3]