Tool for analysing and checking MPI programs
What is MARMOT?
Debugging MPI-programs normally is very frustrating:
- How to debug a program that runs on 512 nodes and crashes after several hours of runtime?
- How to detect errors concerning wrong usage of MPI?
MARMOT surveys the MPI-calls made and automatically checks the correct usage of these calls and their arguments during runtime. It does not replace classical debuggers, but can be used in addition to them.
We have finished a first development release. Currently MARMOT supports the C and Fortran language binding of the MPI-standard 1.2. MARMOT is a library that is linked to the MPI-application in addition to the existing MPI-library and that allows a detailed analysis of this application at runtime. It generates a human-readable log file:
- Violations of the MPI-standard are reported as error.
- Unusual behaviour or possible problems is reported as warnings.
- Notes are displayed when harmless but remarkable behaviour occurs.
- The MPI-calls are traced on each node throughout the whole application.
- When detecting a deadlock the last few calls (as configured by the user) can be traced back on each node.
MARMOT is intended to be a portable tool that runs on any platform and with any MPI-implementation. It has been tested using
- Linux IA32/IA64/AMD64/EM64T with MPICH, MPICH-G2, LAM, Open MPI, and many others
- NEC SX-8, SX-6
- Windows Server 2003, 2008 with MSMPI v1 and v2
- IBM Regatta
- Cray T3E
- SGI Altix 4700