Conference Tutorials
These tutorials have been offered at various conferences and cover general GPGPU programming techniques as well as introductions to CUDA, OpenCL and old-school GPGPU programming through graphics APIs. Full slide decks and in some cases, also sample code are available. Feel free to use these tutorials in your own work, but please keep the acknowledgements.
- PPAM 2013, Warsaw, Poland: full day tutorial on Scientific Computing on Accelerators, together with Robert Strzodka (September 2013)
- PPAM 2011, Torun, Poland: full day tutorial on Scientific Computing on GPUs, together with Jakub Kurzak, Jan-Philipp Weiß, Tim Schröder and André Heidekrüger (September 2011)
- INRIA 2011, Sophia Antipolis, France: Half-day tutorial in conjunction with the INRIA Summer School 2011 Toward petaflop numerical simulation on parallel hybrid architectures, together with Robert Strzodka (June 2011)
- PPAM 2009, Wroclaw, Poland: full day tutorial on OpenCL and Scientific Computing on GPUs, together with Dominik Behr and Robert Strzodka (September 2009)
- SPEEDUP 2009, EPF Lausanne, Switzerland: full-day tutorial on GPU Computing with NVIDIA CUDA, together with Robert Strzodka and Christian Sigg (September 2009)
- GPU Computing with NVIDIA CUDA: half-day tutorials at the University of Freiburg / Jedox AG (May 2009) and for the Sonderforschungsbereich 708 (Dortmund, June 2009)
- ARCS 2008, Dresden, Germany: full day tutorial on GPU Computing with NVIDIA CUDA, together with Simon Green and Robert Strzodka (January 2008)
- ICCS 2006, Reading, UK: half-day tutorial on GPU Computing with OpenGL, together with Robert Strzodka (May 2006)
OpenCL and CUDA Sample Code
The sample code below demonstrate how a simple vector addition can be implemented in OpenCL and CUDA. The OpenCL version includes contributions by Dirk Ribbrock. These tutorial codes are also featured on http://gpgpu.org and have been made available as part of GPGPU.org's SourceForge project.
More advanced sample code is provided as part of the PPAM 2013 tutorial.
Old-School GPGPU Coding Tutorials
Back in 2005-2006, I assembled a set of beginners' tutorials on GPGPU linear algebra programming using graphics APIs. What makes these tutorials different from the official GPGPU Hello World (official as in: back in the days) is that they don't even open up a window for display, it's all about offscreen co-processor style computing. I keep them here for historical and admittedly sentimental reasons. All tutorials are also featured on http://gpgpu.org and have been made available as part of GPGPU.org's SourceForge project.
- The GPGPU Basic Math Tutorial is targeting complete beginners in GPU programming. All basic concepts and the programming paradigm are covered. Fully working code samples for the various ways to implement GPGPU codes based on OpenGL are provided. A modest background knowledge of the graphics pipeline and of OpenGL is however useful. The prerequisites section of the tutorial links to other tutorials about these topics. I consider this tutorial essential for anyone starting out in GPGPU.
- The GPGPU Reduction Tutorial extends the concepts introduced in my basic tutorial to reduction-type operations such as maximum/minimum, norms and dot product of various vectors represented as textures on the GPU.
- The GPGPU Fast Transfers Tutorial demonstrates how pixel-buffer-objects, an extension to OpenGL, can be used to increase the speed of transfers to and from the graphics card.
- The MRT tutorial demonstrates how to output to several arrays in one shader pass on the GPU. This tutorial is currently only available as well-commented source code: [Cg version], [GLSL version].
Outdated, Unsupported Old-School Tutorials
My first tutorial ever, the GPGPU Ping Pong Tutorial is still available. In contrast to the Basic Math Tutorial, it is based on an outdated OpenGL technique called pBuffers and contains less details. [PDF] [sources using Cg as shader language] [sources using GLSL as shader language]
The GPGPU Performance Tuning Tutorial outlines several steps to increase the performance of a Jacobi iteration (usually to be used as a smoother in multigrid) for banded FEM matrices. The application and especially the tricks presented are however completely independent of the FEM background. The implementation is based on pBuffers and therefore fundamentally outdated. [PDF] [sources for the first data layout] [sources for the second data layout]