Jos MartinAt the NVIDIA GPU Technology Conference (GTC) in September this year several ISVs announced extensive support for GPU based functionality. A reasonable question that arises from these announcements is ‘why now’?  GPUs have already found favour in a number of computational disciplines such as Computational Fluid Dynamics (CFD) and finite element analysis. However it is only relatively recently that they have provided sufficient computational ability to be ready for main stream use. What did they lack? Importantly they lacked support for double precision arithmetic and conformance with IEEE single and double behaviour.

With the advent of the newer NVIDIA cards (compute capability 1.3 and later) this hurdle has been overcome, and this is why ISVs are introducing GPU functionality now.

video-thumbnail-codeAs important as the ‘why now’ question is ‘why should I care’? The answer is that many computations that engineers use to simulate their models and analyze their data run significantly faster on GPUs than on the computer itself. Great examples are Fast Fourier Transforms where you can see computational speed-ups of 10x – 30x in double precision (with even greater speedups in single), linear algebra functionality such as matrix multiply (around 7x) and backslash (around 5x).

There are obviously many other computations that can be accelerated. The goal of any package that uses GPUs is to make this computational ability available to the user.

However, it isn’t sufficient to just have computation available. It is also vitally important to make this functionality easy to use. Writing native CUDA code is not something that most engineers would consider a good use of their time (in fact even most seasoned programmers find this task very challenging). Engineers need a higher level language that provides access to the GPU.  This is where significant trade-offs are made: the simpler the application programming interface (API), the less functionality that will be available. Conversely, the more GPU functionality exposed, the more difficult the API is to use.

This problem can be overcome with several different levels of API that work together seamlessly. The simplest of these APIs should provide the features that almost all users want; for example maths functions and ability to create and manipulate data on the GPU. The next level should build on the general features provided by the first level, and extend the detail and functionality available. This API would be slightly more difficult to use, but would provide greater control. At the very lowest level a user may need direct access to CUDA kernels and other more complex programming structures to enable integration with legacy applications or direct control of the device.

The challenge for both hardware vendors and ISVs is to work together to provide such a system. Since this technology is still in its infancy there are significant differences in all areas of the hardware and software stack; this leads to a proliferation of different solutions. Differing solutions should be judged on their ability to easily solve the wide range of problems that engineers encounter. Read more about GPUs on MathWorks’ site here: