Compiling and Running Applications

The repository has several example applications in chapel-gpu/example and chapel-gpu/apps directory, most of which has a distributed version:

Benchmark

Location

Description

Note

Vector Copy

example and apps/vector_copy

A simple vector kernel

STREAM

apps/stream

A = B + alpha * C

BlackScholes

apps/blackscholes

The Black-Scholes Equation

Logistic Regression

apps/logisticregression

A classification algorithm

Matrix Multiplication

apps/mm

Matrix-Matrix Multiply

PageRank

apps/mm

The pagerank algorithm

WIP

N-Queens

WIP

The n-queens problem

WIP

GPU API Examples

example/gpuapi

Note

This section assumes the Chapel-GPU components are already installed in $CHPL_GPU_HOME. If you have not done so please see Building Chapel-GPU.

Compiling Applications

The example applications in chapel-gpu/example and chapel-gpu/apps directory can be build by just doing make X, where X is either cuda, hip, opencl, or dpcpp. Please be sure to source the setting script before doing so.

  1. Set environment variables

source $CHPL_GPU_HOME/bin/env.sh
  1. Compile

    • Example 1: chapel-gpu/example (Chapel + GPUIterator + a full GPU program)

      cd path/to/chapel-gpu/example
      make cuda
      or
      make hip
      or
      make opencl
      or
      make dpcpp
      
    • Example 2: chapel-gpu/example/gpuapi (Chapel + GPUAPI + a GPU kernel)

      cd path/to/chapel-gpu/example/gpuapi/2d
      make cuda
      or
      make hip
      or
      make dpcpp
      
    • Example 3: chapel-gpu/apps/stream (Chapel + GPUIterator + GPUAPI + a GPU kernel)

      cd path/to/chapel-gpu/apps/stream
      make cuda
      or
      make hip
      or
      make opencl
      or
      make dpcpp
      

    Note

    A baseline implementation for CPUs can be built by doing make baseline.

  2. Check the generated executables

    For example, make cuda in apps/vector_copy generates the following files:

    Name

    Description

    Individual make command

    vc.baseline

    A baseline implementation for CPUs.

    make baseline

    vc.cuda.gpu

    A GPU-only implmentation w/o the GPUIterator.

    make cuda.gpu

    vc.cuda.hybrid

    The GPUIterator implemenation (single-locale).

    make cuda.hybrid

    vc.cuda.hybrid.dist

    The GPUIterator implemenation (multi-locale).

    make cuda.hybrid.dist

    vc.cuda.hybrid.dist.midlow

    The MID-LOW implemenation (multi-locale).

    make cuda.hybrid.dist.midlow

    vc.cuda.hybrid.dist.mid

    The MID implementation (multi-locale).

    make cuda.hybrid.dist.mid

    Tip

    If you want to compile a specific variant, please do make X.Y, where X is either cuda, hip, opencl, or dpcpp and Y is either gpu, hybrid, hybrid.dist, hybrid.dist.midlow, or hybrid.dist.mid. Please also see the third column above. Also, the MID-LOW and MID variants with OpenCL are currently not supported.

Note

The Makefile internally uses cmake to generate a static library from a GPU source program (vc.cu in this case). Since it is not always trivial to figure out right options to compile GPU programs, we outsource it to cmake. However, when linking a GPU object and the GPUAPI library to a Chapel program, we end up getting back to make because Chapel is not officially supported in cmake.

If you want to manually compile your Chapel program (say test.chpl) with your GPU program (say gpu.cu), you can do so like this (CUDA for example):

nvcc -c gpu.cu
# Note: gpu.h is supposed to include function declarations that are referred from test.chpl
chpl -M ${CHPL_GPU_HOME}/modules ${CHPL_GPU_HOME}/include/GPUAPI.h gpu.h test.chpl gpu.o -L${CHPL_GPU_HOME}/lib -L${CHPL_GPU_HOME}/lib64 -lGPUAPICUDA_static -L${CUDA_ROOT_DIR}/lib -lcudart
For more details on compiling Chapel programs with external C/C++ programs, please see this.

Running Applications

Once you have compiled a Chapel-GPU program, you can run it from the command-line:

./vc.cuda.hybrid

Also, many of the example applications accepts the --n option, which changes input size, the --CPUratio (or --CPUPercent) option, which controls the percentage of an iteration space will be executed on CPUs, and the --output option, which outputs the result arrays. For example:

./vc.cuda.hybrid --n=256 --CPUratio=50 --output=1

For multi-locale execution, please refer to this document.