Difference between revisions of "HOWTO use Valgrind to debug memory problems"

From Nsnam
Jump to: navigation, search
m (remove extraneous space in "--run")
(valgrind updates)
Line 7: Line 7:
 
for example:
 
for example:
 
   ./waf --command-template="valgrind --leak-check=full --show-reachable=yes %s" --run main-propagation-loss
 
   ./waf --command-template="valgrind --leak-check=full --show-reachable=yes %s" --run main-propagation-loss
 +
 +
== Supported platforms ==
 +
 +
Valgrind for ns-3 is known to work on recent Linux systems that do not have gtk enabled.  In particular, there seems to be a leak in some libraries
 +
 +
To disable gtk (even if it is on your system), provide the "--disable-gtk" option at configure time:
 +
 +
  ./waf configure --disable-gtk --enable-examples
 +
 +
An example of clean valgrind output for the "first" tutorial example is below:
 +
 +
./waf --valgrind --run first
 +
  Waf: Entering directory `/path/to/ns-3-dev/build'
 +
  Waf: Leaving directory `/path/to/ns-3-dev/build'
 +
  'build' finished successfully (2.510s)
 +
  ==29514== Memcheck, a memory error detector
 +
  ==29514== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
 +
  ==29514== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
 +
  ==29514== Command: /path/to/ns-3-dev/build/examples/tutorial/ns3-dev-first-debug
 +
  ==29514==
 +
  At time 2s client sent 1024 bytes to 10.1.1.2 port 9
 +
  At time 2.00369s server received 1024 bytes from 10.1.1.1 port 49153
 +
  At time 2.00369s server sent 1024 bytes to 10.1.1.1 port 49153
 +
  At time 2.00737s client received 1024 bytes from 10.1.1.2 port 9
 +
  ==29514==
 +
  ==29514== HEAP SUMMARY:
 +
  ==29514==    in use at exit: 0 bytes in 0 blocks
 +
  ==29514==  total heap usage: 4,122 allocs, 4,122 frees, 259,597 bytes allocated
 +
  ==29514==
 +
  ==29514== All heap blocks were freed -- no leaks are possible
 +
  ==29514==
 +
  ==29514== For counts of detected and suppressed errors, rerun with: -v
 +
  ==29514== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
 +
 +
If gtk is not disabled, you may get notifications of leaks that exist in third-party libraries, such as:
 +
 +
  ==891== 2,072 bytes in 1 blocks are still reachable in loss record 242 of 246
 +
  ==891==    at 0x4C28409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
 +
  ==891==    by 0xE010EFA: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 +
  ==891==    by 0xE029ABF: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 +
  ==891==    by 0xE0001F5: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 +
  ==891==    by 0xE01160B: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 +
  ==891==    by 0xDFC3508: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 +
  ==891==    by 0x400F4F2: _dl_init (in /usr/lib64/ld-2.17.so)
 +
  ==891==    by 0x4001459: ??? (in /usr/lib64/ld-2.17.so)
 +
 +
 +
  
 
== Common errors ==
 
== Common errors ==

Revision as of 17:29, 11 October 2013

Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

Memory leaks or errors can be found with Valgrind. Support for valgrind is built into the ./waf system by typing:

 ./waf --command-template="valgrind [options] %s" --run ns-3-program-name

for example:

 ./waf --command-template="valgrind --leak-check=full --show-reachable=yes %s" --run main-propagation-loss

Supported platforms

Valgrind for ns-3 is known to work on recent Linux systems that do not have gtk enabled. In particular, there seems to be a leak in some libraries

To disable gtk (even if it is on your system), provide the "--disable-gtk" option at configure time:

 ./waf configure --disable-gtk --enable-examples

An example of clean valgrind output for the "first" tutorial example is below:

./waf --valgrind --run first
 Waf: Entering directory `/path/to/ns-3-dev/build'
 Waf: Leaving directory `/path/to/ns-3-dev/build'
 'build' finished successfully (2.510s)
 ==29514== Memcheck, a memory error detector
 ==29514== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
 ==29514== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
 ==29514== Command: /path/to/ns-3-dev/build/examples/tutorial/ns3-dev-first-debug
 ==29514== 
 At time 2s client sent 1024 bytes to 10.1.1.2 port 9
 At time 2.00369s server received 1024 bytes from 10.1.1.1 port 49153
 At time 2.00369s server sent 1024 bytes to 10.1.1.1 port 49153
 At time 2.00737s client received 1024 bytes from 10.1.1.2 port 9
 ==29514== 
 ==29514== HEAP SUMMARY:
 ==29514==     in use at exit: 0 bytes in 0 blocks
 ==29514==   total heap usage: 4,122 allocs, 4,122 frees, 259,597 bytes allocated
 ==29514== 
 ==29514== All heap blocks were freed -- no leaks are possible
 ==29514== 
 ==29514== For counts of detected and suppressed errors, rerun with: -v
 ==29514== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

If gtk is not disabled, you may get notifications of leaks that exist in third-party libraries, such as:

 ==891== 2,072 bytes in 1 blocks are still reachable in loss record 242 of 246
 ==891==    at 0x4C28409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
 ==891==    by 0xE010EFA: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 ==891==    by 0xE029ABF: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 ==891==    by 0xE0001F5: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 ==891==    by 0xE01160B: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 ==891==    by 0xDFC3508: ??? (in /usr/lib64/libpixman-1.so.0.30.0)
 ==891==    by 0x400F4F2: _dl_init (in /usr/lib64/ld-2.17.so)
 ==891==    by 0x4001459: ??? (in /usr/lib64/ld-2.17.so)



Common errors

Please list hints here as to what kind of errors have known resolution.

Failure to call Simulator::Destroy()

Simulator::Destroy() will free memory that is created with the ns-3 object system. Forgetting to call Simulator::Destroy () when you are done will lead to reachable memory being reported by valgrind, such as this trace of the main-propagation-loss example:


 ==16325== 88 bytes in 1 blocks are still reachable in loss record 4 of 4
 ==16325==    at 0x4A069D5: operator new(unsigned long) (vg_replace_malloc.c:261)
 ==16325==    by 0x4C87A6B: ns3::TypeId ns3::TypeId::AddConstructor<ns3::DefaultSimulatorImpl>()::Maker::Create() (type-id.h:429)
 ==16325==    by 0x4C778A3: ns3::FunctorCallbackImpl<ns3::ObjectBase* (*)(), ns3::ObjectBase*, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty>::operator()() (callback.h:166)
 ==16325==    by 0x4CBF22C: ns3::Callback<ns3::ObjectBase*, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty>::operator()() const (callback.h:407)
 ==16325==    by 0x4CBE05E: ns3::ObjectFactory::Create() const (object-factory.cc:69)
 ==16325==    by 0x4C8569E: ns3::Ptr<ns3::SimulatorImpl> ns3::ObjectFactory::Create<ns3::SimulatorImpl>() const (object-factory.h:110)
 ==16325==    by 0x4C823FB: ns3::GetImpl() (simulator.cc:93)
 ==16325==    by 0x4C831B1: ns3::Simulator::Stop(ns3::Time const&) (simulator.cc:184)
 ==16325==    by 0x404157: TestDeterministic(ns3::Ptr<ns3::PropagationLossModel>) (main-propagation-loss.cc:83)
 ==16325==    by 0x405CC1: main (main-propagation-loss.cc:225)
 ==16325== 
 ==16325== LEAK SUMMARY:
 ==16325==    definitely lost: 0 bytes in 0 blocks
 ==16325==    indirectly lost: 0 bytes in 0 blocks
 ==16325==      possibly lost: 0 bytes in 0 blocks
 ==16325==    still reachable: 200 bytes in 4 blocks
 ==16325==         suppressed: 0 bytes in 0 blocks
 ==16325== 
 ==16325== For counts of detected and suppressed errors, rerun with: -v
 ==16325== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)

Here, the clue is the line that says "ns3::Ptr<ns3::SimulatorImpl> ns3::ObjectFactory::Create<ns3::SimulatorImpl>() const (object-factory.h:110)" and the lines above it. If you see an error such as that reporting "still reachable" blocks, it is often the case that you forgot to call Simulator::Destroy() to free objects that have been created as factory objects.

test.py memory leaks

Sometimes a memory leak pop up on a test program (the ones enabled by "configure --enable-tests" and launched with "./test.py -g something"). That's ok, the test programs are there with this precise purpose: to show memory leaks and to check the correct ns-3 behavior.

The trick, however, is to *not* search for the memory leak using the test program. Write an equivalent ns-3 simulation, the simple, the better. Then use valgrind on that one, find the leak and exterminate it. The rationale is: the test programs are launched by a slightly different system than the usual waf launcher (built to run a lot of different tests sequentially). Hence, you could find in the valgrind report a lot of obscure data that (probably) will point you in the wrong direction.

If, however, you can not replicate the memory leak in a "normal" ns-3 simulation, remember that the memory leak might be as well in the test program... in that case: good luck.