GPUAPI¶
MID-level API Reference¶
- class GPUArray¶
- proc init(ref arr, pitched=false)¶
Allocates memory on the device. The allocation size is automatically computed by this module -i.e.,
(arr.size: c_size_t) * c_sizeof(arr.eltType)
, which means the index space is linearlized whenarr
is multi-dimensional. Also, ifarr
is 2D andpitched=true
, pitched allocation is performed and the host and device pitch can be obtained by doingobj.hpitch
andobj.dpitch
. Note that the allocated memory is automatically reclaimed when the object is deleted.- Arguments
arr – The reference of the non-distributed Chapel Array that will be mapped onto the device.
pitched – whether pitched allocation is performed or not (default is false)
// Example 1: Non-distributed array var A: [1..n] int; proc GPUCallBack(lo: int, hi: int, N: int) { // n * sizeof(int) will be allocated onto the device var dA = new GPUArray(A); ... } // GPUIterator forall i in GPU(1..n, GPUCallBack) { A(i) = ...; } // Example 2: Distributed array use BlockDist; var D: domain(1) dmapped blockDist(boundingBox = {1..n}) = {1..n}; var A: [D] int; proc GPUCallBack(lo: int, hi: int, n: int) { // get the local portion of the distributed array var localA = A.localSlice(lo...hi); // n * sizeof(int) will be allocated onto the device var dA = new GPUArray(localA); ... } // GPUIterator forall i in GPU(D, GPUCallBack) { A(i) = ...; }
Note
The allocated memory resides on the current device. With the
GPUIterator
, the current device is automatically set by it. Without it, it is the user’s responsibilities to set the current device (e.g., by calling theSetDevice
API below). Otherwise, the default device (usually the first GPU) will be used.Note
With distributed arrays, it is required to use Chapel array’s localSlice API to get the local portion of the distributed array. With the
GPUIterator
, the local portion is already computed and given as the first two arguments (lo
andhi
).
- proc toDevice()¶
Transfers the contents of the Chapel array to the device.
proc GPUCallBack(lo: int, hi: int, n:int) { var dA = GPUArray(A); dA.toDevice(); }
- proc fromDevice()¶
Transfers back the contents of the device array to the Chapel array.
proc GPUCallBack(lo: int, hi: int, n:int) { var dA = GPUArray(A); dA.fromDevice(); }
- proc free()¶
Frees memory on the device.
proc GPUCallBack(lo: int, hi: int, n:int) { var dA = GPUArray(A); dA.free(); }
- proc dPtr(): c_ptr(void)¶
Returns a pointer to the allocated device memory.
- Returns
pointer to the allocated device memory
- Return type
c_ptr(void)
- proc hPtr(): c_ptr(void)¶
Returns a pointer to the head of the Chapel array.
- Returns
pointer to the head of the Chapel array
- Return type
c_ptr(void)
- proc toDevice(args: GPUArray ...?n)¶
Utility function that takes a variable number of
GPUArray
and performs thetoDevice
operation for each.
- proc fromDevice(args: GPUArray ...?n)¶
Utility function that takes a variable number of
GPUArray
and performs thefromDevice
operation for each.
- proc free(args: GPUArray ...?n)¶
Utility function that takes a variable number of
GPUArray
and performs thefree
operation for each.
var dA = GPUArray(A);
var dB = GPUArray(B);
var dC = GPUArray(C);
toDevice(A, B)
..
fromDevice(C);
// GPU memory is automatically deallocated when dA, dB, and dC.
MID-LOW-level API Reference¶
- proc Malloc(ref devPtr: c_ptr(void), size: c_size_t)¶
Allocates memory on the device.
- Arguments
devPtr : c_voidPtr – Pointer to the allocated device array
size : c_size_t – Allocation size in bytes
// Example 1: Non-distributed array var A: [1..n] int; proc GPUCallBack(lo: int, hi: int, N: int) { var dA: c_ptr(void); Malloc(dA, (A.size: c_size_t) * c_sizeof(A.eltType)); ... } // GPUIterator forall i in GPU(1..n, GPUCallBack) { A(i) = ...; } // Example 2: Distributed array use BlockDist; var D: domain(1) dmapped blockDist(boundingBox = {1..n}) = {1..n}; var A: [D] int; proc GPUCallBack(lo: int, hi: int, n: int) { var dA: c_ptr(void); // get the local portion of the distributed array var localA = A.localSlice(lo...hi); Malloc(dA, (localA.size: c_size_t) * c_sizeof(localA.eltType)); ... } // GPUIterator forall i in GPU(D, GPUCallBack) { A(i) = ...; }
Note
c_sizeofo(A.eltType)
returns the size in bytes of the element of the Chapel arrayA
. For more details, please refer to this.
- proc MallocPitch(ref devPtr: c_ptr(void), ref pitch: c_size_t, width: c_size_t, height: c_size_t)¶
Allocates pitched 2D memory on the device.
- Arguments
devPtr : c_voidPtr – Pointer to the allocated pitched 2D device array
pitch : c_size_t – Pitch for allocation on the device, which is set by the runtime
width : c_size_t – The width of the original Chapel array (in bytes)
height : c_size_t – The number of rows (height)
- proc Memcpy(dst: c_ptr(void), src: c_ptr(void), count: c_size_t, kind: int)¶
Transfers data between the host and the device
- Arguments
dst : c_ptr(void) – the desination address
src : c_ptr(void) – the source address
count : c_size_t – size in bytes to be transferred
kind : int – type of transfer (
0
: host-to-device,1
: device-to-host)
// Non-distributed array var A: [1..n] int; proc GPUCallBack(lo: int, hi: int, N: int) { var dA: c_ptr(void); Malloc(dA, (A.size: c_size_t) * c_sizeof(A.eltType)); // host-to-device Memcpy(dA, c_ptrTo(A), size, 0); // device-to-host Memcpy(c_ptrTo(A), dA, size, 1)); }
Note
c_ptrTo(A)
returns a pointer to the Chapel rectangular arrayA
. For more details, see this document.
- proc Memcpy2D(dst: c_ptr(void), dpitch: c_size_t, src: c_ptr(void), spitch: c_size_t, width: c_size_t, height:c_size_t, kind: int)¶
Transfers pitched 2D array between the host and the device
- Arguments
dst : c_ptr(void) – the desination address
dpitch : c_size_t – the pitch of destination memory
src : c_ptr(void) – the source address
spitch : c_size_t – the pitch of source memory
width : c_size_t – the width of 2D array to be transferred (in bytes)
height : c_size_t – the height of 2D array to be transferred (# of rows)
kind : int – type of transfer (
0
: host-to-device,1
: device-to-host)
- proc Free(devPtr: c_ptr(void))¶
Frees memory on the device
- Arguments
devPtr : c_ptr(void) – Device pointer to memory to be freed.
- proc GetDeviceCount(ref count: int(32))¶
Returns the number of GPU devices on the current locale.
- Arguments
count : int(32) – the number of GPU devices
var nGPUs: int(32); GetDeviceCount(nGPUs); writeln(nGPUs);
- proc GetDevice(ref id: int(32))¶
Returns the device ID currently being used.
- Arguments
id : int(32) – the device ID current being used
- proc SetDevice(device: int(32))¶
Sets the device ID to be used.
- Arguments
id : int(32) – the device ID to be used.
id
must be 1) greater than or equal to zero, and 2) less than the number of GPU devices.
- proc ProfilerStart()¶
NVIDIA GPUs Only Start profiling with
nvprof
- proc ProfilerStop()¶
NVIDIA GPUs Only Stop profiling with
nvprof
proc GPUCallBack(lo: int, hi: int, N: int) { ProfilerStart(); ... ProfilerStop(); }
- proc DeviceSynchronize()¶
Waits for the device to finish.