Guide to Write GPU programs

General Guidelines

In general, GPU programs should include typical host and device operations including device memory (de)allocations, data transfers, and kernel invocations. Depending on the abstraction level you choose, some of these operations can be written in a Chapel-user-friendly way:

Level

MID-level

MID-LOW-level

LOW-level

Kernel Invocation

CUDA/HIP/DPC++

CUDA/HIP/DPC++

CUDA/HIP/DPC++/OpenCL

Memory (de)allocations

Chapel (MID)

Chapel (MID-LOW)

CUDA/HIP/OpenCL

Data transfers

Chapel (MID)

Chapel (MID-LOW)

CUDA/HIP/OpenCL

Note

LOW/MID-LOW/MID levels can interoperate with each other.

Writing GPU program

The design and implementation of a CUDA/HIP/OpenCL program that is supposed to be called from the callback function is completely up to you. However, please be aware that it can be called multiple times (i.e., the number of GPUs per locale * the number of locales) as the GPUIterator automatically and implicitly handles multiple- GPUs and locales. We’d highly recommend writing your GPU program in a way that is 1) device neutral (no device setting call) and 2) flexibile to change in iteration spaces -i.e., start and end (including data allocations and transfers).