.. default-domain:: chpl ============================================= Writing MID-LOW-level programs ============================================= MID-LOW-level API ###################### The biggest motivation for introducing ``MID-LOW`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function: .. code-block:: chapel :caption: vc.hybrid.chpl // lo, hi, and N are automatically computed by the GPUIterator proc GPUCallBack(lo: int, hi: int, N: int) { vcCUDA(A, B, lo, hi, N); } .. code-block:: c :caption: vc.cu extern "C" { void vcCUDA(float* A, float *B, int start, int end, int GPUN) { float *dA, *dB; cudaMalloc(&dA, sizeof(float) * GPUN); cudaMalloc(&dB, sizeof(float) * GPUN); cudaMemcpy(dB, B + start, sizeof(float) * GPUN, cudaMemcpyHostToDevice); vc<<>>(dA, dB, GPUN); cudaDeviceSynchronize(); cudaMemcpy(A + start, dA, sizeof(float) * GPUN, cudaMemcpyDeviceToHost); cudaFree(dA); cudaFree(dB); } } At the MID-LOW-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. However, it's worth noting that this level of abstraction only provides thin wrapper functions for the CUDA/HIP/OpenCL-level API functions, which requires you to directly manipulate C types like ``c_ptr(void)`` and so on. The MID-LOW is helpful particularly when you want to fine-tune the use of GPU API, but still want to stick with Chapel. Here is an example program written with the MID-LOW-level API: .. code-block:: chapel :caption: vc.hybrid.chpl proc GPUCallBack(lo: int, hi: int, N: int) { var dA, dB: c_ptr(void); var size: c_size_t = (lA.size:c_size_t * c_sizeof(lA.eltType)); Malloc(dA, size); Malloc(dB, size); Memcpy(dB, c_ptrTo(lB), size, 0); LaunchVC(dA, dB, N: c_size_t); DeviceSynchronize(); Memcpy(c_ptrTo(lA), dA, size, 1); Free(dA); Free(dB); } .. tip:: The MID-LOW-level API can interoperate with the MID-level API. .. seealso:: :ref:`MID-LOW-level API Reference`