This lesson is still being designed and assembled (Pre-Alpha version)

Lesson Title: Glossary

Key Points

CuPy and Numba on the GPU
  • CuPy is NumPy, but for the GPU

  • Data is copied from the CPU (host) to the GPU (device), where it is computed on. After a computation, it need to be copied back to the CPU to be interacted with by numpy, etc

  • %timeit can be used to benchmark the runtime of GPU spedup functions

  • GPU spedup functions are optimized for at least four things: 1. input size 2. compute complexity 3. CPU/GPU copying 4. data type. Concretely, a gpu spedup function can be slow because the input size is too small, the computation is too simple, there is excessive data copying to/from GPU/CPU, and the input types are excessivly large (e.g. np.float64 vs np.float32)

  • Make GPU spedup ufuncs with @numba.vectorize(..., target='cuda')

  • Make CUDA device functions with @numba.cuda.jit(device=True)

CuPy and Numba on the GPU
  • CuPy is NumPy, but for the GPU

  • Data is copied from the CPU (host) to the GPU (device), where it is computed on. After a computation, it need to be copied back to the CPU to be interacted with by numpy, etc

  • %timeit can be used to benchmark the runtime of GPU spedup functions

  • GPU spedup functions are optimized for at least four things: 1. input size 2. compute complexity 3. CPU/GPU copying 4. data type. Concretely, a gpu spedup function can be slow because the input size is too small, the computation is too simple, there is excessive data copying to/from GPU/CPU, and the input types are excessivly large (e.g. np.float64 vs np.float32)

  • Make GPU spedup ufuncs with @numba.vectorize(..., target='cuda')

  • Make CUDA device functions with @numba.cuda.jit(device=True)

Glossary

FIXME