BLAS++ 2024.05.31
BLAS C++ API
Loading...
Searching...
No Matches
blas::Queue Class Reference

Queue for executing GPU device routines. More...

#include <device.hh>

Public Types

using stream_t = void *
 

Public Member Functions

 Queue ()
 Default constructor.
 
 Queue (int device)
 Constructor with device.
 
 Queue (int device, stream_t &stream)
 
 Queue (Queue const &)=delete
 
Queueoperator= (Queue const &)=delete
 
int device () const
 
void sync ()
 Synchronize with queue.
 
void * work ()
 
template<typename scalar_t >
size_t work_size () const
 
template<typename scalar_t >
void work_ensure_size (size_t lwork)
 Ensures GPU device workspace is of size at least lwork elements of scalar_t, synchronizing and reallocating if needed.
 
void fork (int num_streams=MaxForkSize)
 Forks the kernel launches assigned to this queue to parallel streams.
 
void join ()
 Switch executions on this queue back from parallel streams to the default stream.
 
void revolve ()
 In fork mode, switch execution to the next-in-line stream.
 
void set_stream (stream_t &in_stream)
 
stream_t & stream ()
 

Detailed Description

Queue for executing GPU device routines.

This wraps CUDA stream and cuBLAS handle, HIP stream and rocBLAS handle, or SYCL queue.

Constructor & Destructor Documentation

◆ Queue()

blas::Queue::Queue ( )

Default constructor.

For CUDA and ROCm, creates a Queue on the current device. For SYCL, throws an error. todo: SYCL has a default device, how to use it?

Member Function Documentation

◆ fork()

void blas::Queue::fork ( int  num_streams = MaxForkSize)

Forks the kernel launches assigned to this queue to parallel streams.

Limits the actual number of streams to <= MaxForkSize. This function is not nested (you must join after each fork).

◆ join()

void blas::Queue::join ( )

Switch executions on this queue back from parallel streams to the default stream.

This function is not nested (you must join after each fork).

◆ revolve()

void blas::Queue::revolve ( )

In fork mode, switch execution to the next-in-line stream.

In join mode, no effect.

◆ work()

void * blas::Queue::work ( )
inline
Returns
device workspace.

◆ work_ensure_size()

template<typename scalar_t >
void blas::Queue::work_ensure_size ( size_t  lwork)

Ensures GPU device workspace is of size at least lwork elements of scalar_t, synchronizing and reallocating if needed.

Allocates at least 3 * MaxBatchChunk * sizeof(void*), needed for batch gemm.

Parameters
[in]lworkMinimum size of workspace.

◆ work_size()

template<typename scalar_t >
size_t blas::Queue::work_size ( ) const
inline
Returns
size of device workspace, in scalar_t elements.

The documentation for this class was generated from the following files: