HOWTO use oprofile
Main Page - Roadmap - Summer Projects - Project Ideas - Developer FAQ - Tools - Related Projects
HOWTOs - Installation - Troubleshooting - User FAQ - Samples - Models - Education - Contributed Code - Papers
This is a brief HOWTO on using oprofile to statistically sample the execution performance of an ns-3 program.
Background
There are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
We will be using operf and opreport; some documentation is found here:
Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
The ns-3 manual also has a section on profiling tools with ns-3. Some of the tools have some CMake integration (but oprofile does not).
Use
oprofile is available as a package on Linux distros (e.g. 'apt install oprofile').
First, it is strongly recommended to (1) build ns-3 as a static library with optimizations, and (2) disable other examples and tests.
./ns3 configure -d optimized --enable-static
If you skip the "--enable-static", you will end up profiling your main program only and not the library code.
If you enable examples, you will use an enormous amount of disk space (25 GB or more) due to the static linking of all examples.
Your main program should be in the `scratch/` directory (which is still compiled even if examples are disabled).
Example usage
I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example. First, copy the program to the scratch directory and rename it:
$ cp examples/routing/manet-routing-compare.cc scratch/mrc.cc
Next, configure and build ns-3:
$ ./ns3 clean $ ./ns3 configure -d optimized --enable-static $ ./ns3 build
Then run it with a program called `operf` which is part of oprofile. Since operf requires sudo privileges, and since the `ns3` script disallows sudo, we will recurse into the build directory and run the binary directly, as follows:
$ cd build/scratch $ sudo operf ./ns3-dev-mrc-optimized
Note that if you are running a release of ns-3, your binary may be named differently such as `ns3.41-mrc-optimized`.
Operf will print out something like this:
operf: Profiler started Profiling done.
You may see these warnings, and can probably ignore them:
WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details. Lowering the sampling rate may reduce or eliminate lost samples. See the '--events' option description in the operf man page for help.
or
WARNING! Some of the events were throttled. Throttling occurs when the initial sample rate is too high, causing an excessive number of interrupts. Decrease the sampling frequency. Check the directory /path/to/oprofile_data/samples/current/stats/throttled for the throttled event names.
This leaves a sampling data directory in oprofile_data/ directory location. Next, you can use `opreport` to produce a report. Suggested options are:
# opreport --exclude-dependent --demangle=smart --symbols --threshold=1 > opreport.out
This directs the output to a file. threshold is specified to reduce the amount of output statistics. Let's look at the beginning of this opreport.out file:
$ head -10 opreport.out CPU: Intel Ivy Bridge microarchitecture, speed 3900 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 1398 3.8657 ns3::InterferenceHelper::AppendEvent(ns3::Ptr<ns3::Event>, bool) 1173 3.2436 _int_free 1067 2.9504 pair<_Rb_tree_iterator<pair<ns3::Scheduler::EventKey const, ns3::EventImpl*>>, bool> ... 1000 2.7652 __ieee754_log_avx 981 2.7126 malloc 929 2.5689 __ieee754_pow_sse2 906 2.5053 ns3::ConstantVelocityHelper::Update() const 833 2.3034 _int_malloc 679 1.8776 ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::WifiPpdu const>, double) const
This illustrates that the ns-3 methods InterferenceHelper::AppendEvent(), ConstantVelocityHelper::Update(), and YansWifiChannel::Send() were sampled most frequently. If you are experiencing long runtime performance problems, you may see a problematic method in the output consuming over half of the CPU clock cycles, and you can then try to debug (perhaps with logging or a debugger) why the program spends so much time there.
When you are done, you may want to remove the 'oprofile_data' directory since it is owned by root:
$ sudo rm -rf oprofile_data