Strategies for dealing with memory leaks

Run to the end

A straightforward approach for a general memory health check is to have the program run as usual, but stop just prior to exiting. This is easily done as a one-click selection in MemoryScape.  At this point a number of actions allow you observe the condition of heap memory, such as a viewing the memory leak report, memory statistics, etc.  You can also graphically view the (color-coded) heap as a quick scan for any remaining allocations.  For best practice behavior, you should question any allocations found at this point as deallocation candidates.

MemoryScape's Heap Graphical View with memory leaks

MemoryScape's Heap Graphical View with memory leaks

Using Valgrind to look for memory problems

Valgrind is a fairly common tool used for memory debugging.  It usually involves the Memcheck tool to track and report on allocation and deallocation mismatches.  Generally it can point you to the line in the code where the lost memory was allocated.

Valgrind has the capability to catch memory errors with a fairly fine level of granularity of memory access detection. Note that programs running under Valgrind run significantly more slowly, and use more memory (even twice as much) when using the Memcheck tool. Such overhead can inhibit or even prohibit its use in some environments.

Valgrind's Memory Leak Report output

Valgrind's Memory Leak Report output

Using memscript for leak checking

Regardless of whether a program is suspected of generating memory leaks, your testing process should incorporate a simple health check.  You can do this efficiently through MemoryScape’s memoryscript capabilities. Memscript provides a script for running the program unattended, with memory leak checking and reporting of any known leaks, especially on completion, just before termination.  This approach allows you to flag those otherwise healthy codes for review and correction long before any Out of Memory (OOM) problems arise.

Using MemoryScape to look at memory usage across a cluster

Memory leaks can be a major challenge for scalable applications running on HPC resources. One of our customers reports that 50% of their failures in supercomputing jobs are due to nodes being out of memory.  Some of that may be due to a lack of optimization and planning, but when your memory usage estimates indicate that things should be fine and the job still fails with an Out of Memory error, you need a strategy for trying to understand what is going on.

You need to be able to track and compare the memory behavior of each of the processes in your distributed parallel program, and to catch memory leaks when they are small, before they have bloated your application to the point that it risks triggering the Out of Memory error.

You can do this with MemoryScape, or TotalView with its integrated memory debugging. The advantage of integrating TotalView is that you can exercise explicit control over the program and keep it from running too far ahead.

  1. Run your program under MemoryScape or in TotalView with memory debugging enabled.
  2. Use the memory statistics to see if all of your processes are behaving similarly or if some processes are using more or less memory than others.
  3. Select a process to focus on. Pick one at random if they’re all using more memory than expected or are monotonically growing. Otherwise, focus on processes that are unexpectedly using more memory than their peers.
  4. Pause all of the processes and do leak detection on the selected process.
  5. MemoryScape can detect leaks when they are still very small.

Resource: “Memory Debugging in Parallel and Distributed Applications.”

Using tvscript to automate leak checking within loops

The use of automated scripting can be extremely useful when tracking down suspect memory management behavior.  This is especially true if the code involves repetitive loops that may enclose and conceal leaky behavior. By scripting a leak check within loop cycles in a program, you can monitor the memory activity over a period of time that would be impractical in an interactive session.

[Prev: More causes]