Guide to Write GPU programs¶
General Guidelines¶
In general, GPU programs should include typical host and device operations including device memory (de)allocations, data transfers, and kernel invocations. Depending on the abstraction level you choose, some of these operations can be written in a Chapel-user-friendly way:
Level |
MID-level |
MID-LOW-level |
LOW-level |
---|---|---|---|
Kernel Invocation |
CUDA/HIP/DPC++ |
CUDA/HIP/DPC++ |
CUDA/HIP/DPC++/OpenCL |
Memory (de)allocations |
Chapel (MID) |
Chapel (MID-LOW) |
CUDA/HIP/OpenCL |
Data transfers |
Chapel (MID) |
Chapel (MID-LOW) |
CUDA/HIP/OpenCL |
See also
Note
LOW/MID-LOW/MID levels can interoperate with each other.
Writing GPU program¶
The design and implementation of a CUDA/HIP/OpenCL program that is supposed to be called from the callback function is completely up to you. However, please be aware that it can be called multiple times (i.e., the number of GPUs per locale * the number of locales) as the GPUIterator automatically and implicitly handles multiple- GPUs and locales. We’d highly recommend writing your GPU program in a way that is 1) device neutral (no device setting call) and 2) flexibile to change in iteration spaces -i.e., start
and end
(including data allocations and transfers).