Chapel GPU Documentation¶
Overview¶
This document describes the following two Chapel modules that facilitate GPU programming in Chapel:
GPUIterator: A Chapel iterator that facilitates invoking user-written GPU programs (e.g., CUDA/HIP/OpenCL) from Chapel programs. It is also designed to easily perform hybrid and/or distributed execution - i.e., CPU-only, GPU-only, X% for CPU + Y% for GPU on a single or multiple CPU+GPU node(s), which helps the user to explore the best configuration.
GPUAPI: Chapel-level GPU API that allows the user to perform basic operations such as GPU memory (de)allocations, device-to-host/host-to-device transfers, and so on. This module can be used either standalone or with the GPUIterator module. Currently, the following two tiers of API are provided:
MID-level: Provides Chapel user-friendly GPU API functions.
Example:
var ga = new GPUArray(A);
MID-LOW-level: Provides wrapper functions for raw GPU API functions
Example:
var ga: c_ptr(void) = Malloc(sizeInBytes);
Also, in this document, for categorization purposes, the term LOW-level is referred to a GPUIterator only version, where the GPUIterator is only used for invoking raw GPU programs in which there is no Chapel-level abstraction.