Skip to main content
Beta
This lesson is in the beta phase, which means that it is ready for teaching by instructors outside of the original author team.
GPU Programming
- “CPUs and GPUs are both useful and each has its own place in our
toolbox”
- “In the context of GPU programming, we often refer to the GPU as the
device and the CPU as the host”
- “Using GPUs to accelerate computation can provide large performance
gains”
- “Using the GPU with Python is not particularly difficult”
- “CuPy provides GPU accelerated version of many NumPy and Scipy
functions.”
- “Always have CPU and GPU versions of your code so that you can
compare performance, as well as validate your code.”
- “Numba can be used to run your own Python functions on the
GPU.”
- “Functions may need to be changed to run correctly on a GPU.”
Your First GPU KernelSumming Two Vectors in PythonSumming Two Vectors in CUDARunning Code on the GPU with CuPyUnderstanding the CUDA CodeComputing Hierarchy in CUDAVectors of Arbitrary Size
- “Precede your kernel definition with the
__global__
keyword”
- “Use built-in variables
threadIdx
,
blockIdx
, gridDim
and blockDim
to
identify each thread”
- “Registers can be used to locally store data and avoid repeated
memory operations”
- “Global memory is the main memory space and it is used to share data
between host and GPU”
- “Local memory is a particular type of memory that can be used to
store data that does not fit in registers and is private to a
thread”
- “Shared memory is faster than global memory and local memory”
- “Shared memory can be used as a user-controlled cache to speedup
code”
- “Size of shared memory arrays must be known at compile time if
allocated inside a thread”
- “It is possible to declare
extern
shared memory arrays
and pass the size during kernel invocation”
- “Use
__shared__
to allocate memory in the shared memory
space”
- “Use
__syncthreads()
to wait for shared memory
operations to be visible to all threads in a block”
- “Globally scoped arrays, which size is known at compile time, can be
stored in constant memory using the
__constant__
identifier”