.. default-domain:: chpl ============================================= Writing MID-level programs ============================================= MID-level API ###################### To reiterate, the biggest motivation for introducing ``MID-LOW`` and ``MID`` -level GPU API is moving some of low-level GPU operations to the Chapel-level. Consider the following GPU callback function and C function: .. code-block:: chapel :caption: vc.hybrid.chpl // lo, hi, and N are automatically computed by the GPUIterator proc GPUCallBack(lo: int, hi: int, N: int) { vcCUDA(A, B, lo, hi, N); } .. code-block:: c :caption: vc.cu extern "C" { void vcCUDA(float* A, float *B, int start, int end, int GPUN) { float *dA, *dB; cudaMalloc(&dA, sizeof(float) * GPUN); cudaMalloc(&dB, sizeof(float) * GPUN); cudaMemcpy(dB, B + start, sizeof(float) * GPUN, cudaMemcpyHostToDevice); vc<<>>(dA, dB, GPUN); cudaDeviceSynchronize(); cudaMemcpy(A + start, dA, sizeof(float) * GPUN, cudaMemcpyDeviceToHost); cudaFree(dA); cudaFree(dB); } } At the MID-level, most of the CUDA/HIP/OpenCL-level 1) device memory allocation, 2) device synchronization, and 3) data transfer can be written in Chapel. Also, unlike the MID-LOW level, the MID-level API is more Chapel programmer-friendly, where you can allocate GPU memory using the ``new`` keyword and no longer need to directly manipulate C types. Here is an example program written with the MID-level API: .. code-block:: chapel :caption: vc.hybrid.chpl proc GPUCallBack(lo: int, hi: int, N: int) { // n * sizeof(int) will be automatically allocated onto the device var dA = new GPUArray(A); var dB = new GPUArray(B); dB.toDevice(); LaunchVC(dA.dPtr(), dB.dPtr(), N: c_size_t); dA.fromDevice(); free(dA, dB); } .. tip:: The MID-level API can interoperate with the MID-LOW-level API. .. seealso:: :ref:`MID-level API Reference`