Cudagrind - Memory Transaction Checking for CUDA
Valgrind, and specifically the included tool Memcheck, offers an easy and reliable way for checking the correctness of memory operations in programs. This works in an unintrusive way where Valgrind translates the program into intermediate code and executes it on an emulated CPU. The heavy weight tool Memcheck uses this to keep a full shadow copy of the memory used by a program and tracking accesses to it. This allows the detection of memory leads and checking the validity of accesses.
Though suited for a wide variety of programs, this approach still fails when accelerator based programming models are involved. The code running on these devices is separate from the code running on the host. Access to memory on the device and starting of kernels is being handled by an API provided by the driver being used. Hence Valgrind is unable to understand and instrument operations being run on the device.
The Cudagrind tool has been developed to circumvent this limitation. A set of wrappers, covering a subset of the CUDA Driver API functions responsible for memory operations, are being provided. These allows to check whether memory is fully allocated during a transfer and, through the functionality provided by Valgrind, whether the memory transfered to the device from the host is defined and addressable. Through this technique it is possible to detect a number of common programming mistakes, which are very difficult to debug by other means. Additionally certain types of race conditions connected to memory operations between host and device are being detected by Cudagrind.
Cudagrind has been tested with and should run with the following system parameters:
- Valgrind 3.6.0 or higher
- CUDA SDK 4.0 or higher and compatible CUDA driver
- GCC 4.4.7 or compatible
Different compilers should work, as long as they support the C99 standard and GCC's C-(de)constructor extension.
To compile the tool download the most recent Tarball and make sure the include and library directories of both Valgrind and the CUDA driver are set properly in the Makefile. Then simply call 'make new' to build the dynamic Cudagrind library.
To check a program with Cudagrind simply preload Cudagrind's libcudagrind.so and the CUDA driver's 'libcuda.so' in this order and execute the program with valgrind (or more specifically with the Valgrind tool Memcheck).
Here you can download the latest beta version of Cudagrind: Cudagrind-0.9.4.tgz