Nvidia
has released the first production version of its
Compute
Unified Device Architecture (Cuda) technology designed for developers
creating computationally intensive applications optimised for the firm's GPUs.
GPUs lend themselves to highly parallelised computational tasks owing to the
architectural differences between standard CPUs and GPUs, but it has been very
difficult to access the processing power for anything that is not graphical.
Whereas previous generation GPUs were based on 'streaming shader programs',
programmers can now use Cuda to create programs that use many threads to operate
on large quantities of data in parallel, Nvidia claimed.
In contrast to multi-core CPUs, where only a few threads execute at the same
time, GPUs process thousands of threads simultaneously enabling high
computational throughput across large amounts of data.
The Cuda toolkit offers a standard C interface for programming Cuda-enabled
GPUs, which includes almost all of Nvidia's range of GPUs from the GeForce cads
to its new range of
Tesla
computational GPUs.
The toolkit includes standard FFT and BLAS libraries, a C-compiler for the
Nvidia GPU and a runtime driver.
The Cuda runtime driver is a standalone driver that interoperates with OpenGL
and Microsoft DirectX drivers. Cuda is currently supported on the Linux and
Windows XP operating systems.
For advanced research and language development, there is also a low level
assembly language layer and driver interface.
The Cuda technology also allows threads on Nvidia GPUs to co-operate when
solving a problem.
GPUs featuring Cuda technology have an on-chip parallel data cache that
developers can use to store frequently used information directly on the GPU.
This allows computing threads to instantly share information rather than wait
for data from much slower, off-chip DRams.
The new Cuda release can be downloaded from
Nvidia's
developer website.
Reader comments