HOWTO use oprofile
This is a brief HOWTO on using oprofile to statistically sample the execution performance of an ns-3 program.
here are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
We will be using operf and opreport; some documentation is found here:
Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
oprofile is available as a package on Linux distros (e.g. 'yum install oprofile').
First, it is strongly recommended to build ns-3 as optimized code before profiling.
./waf configure -d optimized --enable-examples --enable-static
If you skip the "--enable-static", you will end up profiling your main program only and not the library code.
Caution: You probably will want to tailor your build to avoid static linking of all examples, either by reducing the scope of the enabled modules, or by avoiding the '--enable-examples' option by using programs in the scratch directory. A full static build of ns-3 and all examples can take up to 25GB of disk.
Previous versions of this HOWTO talked about using a program called 'opcontrol'. This has been deprecated in favor of the 'operf' program. We also make use of 'opreport' below. All of these should be installed as part of installing the oprofile package.
I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example:
$ operf ./waf --run 'manet-routing-compare'
You will see something like this:
operf: Profiler started Waf: Entering directory `/path/to/ns-3-dev/build'
You may see these warnings, and can probably ignore them:
WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details. Lowering the sampling rate may reduce or eliminate lost samples. See the '--events' option description in the operf man page for help.
WARNING! Some of the events were throttled. Throttling occurs when the initial sample rate is too high, causing an excessive number of interrupts. Decrease the sampling frequency. Check the directory /path/to/oprofile_data/samples/current/stats/throttled for the throttled event names.
This leaves a sampling data directory in oprofile_data/ directory location. Next, you can use opreport to produce a report. Suggested options are:
# opreport --exclude-dependent --demangle=smart --symbols --threshold=1 --exclude-symbols=/usr/bin/python2.7 > opreport.out
This directs the output to a file. threshold is specified to reduce the amount of output statistics. exclude-symbols is used to remove Python from statistics calculation. Let's look at the beginning of this opreport.out file:
$ head -20 opreport.out Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % symbol name 348115 19.5607 ns3::IntegralFunction(double, void*) 78415 4.4062 ns3::aodv::RoutingTable::Purge() 67923 3.8166 ns3::DcfManager::GetBackoffStartFor(ns3::DcfState*) 42891 2.4101 ns3::DcfManager::UpdateBackoff() 33460 1.8801 ns3::MapScheduler::Insert(ns3::Scheduler::Event const&) 32708 1.8379 ns3::int64x64_t::Mul(ns3::int64x64_t const&) 30233 1.6988 ns3::Simulator::Now() 27347 1.5366 ns3::DcfManager::GetAccessGrantStart() const 25400 1.4272 ns3::YansWifiPhy::StartReceivePreambleAndHeader(ns3::Ptr<ns3::Packet>, double, ns3::Time) 22664 1.2735 ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::Packet const>, double, ns3::Time) const 21769 1.2232 ns3::InterferenceHelper::CalculateNoiseInterferenceW(ns3::Ptr<ns3::InterferenceHelper::Event>, vector<ns3::InterferenceHelper::NiChange>*) const 21365 1.2005 ns3::DefaultSimulatorImpl::Now() const 20968 1.1782 ns3::DcfManager::MostRecent(ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time) const 20233 1.1369 ns3::WifiPhy::GetMobility() 19682 1.1059 ns3::RngStream::RandU01()
The output show that IntegralFunction () and aodv::RoutingTable::Purge() are good candidates to explore whether their usage can be further optimized.
Thanks to Biljana Bojovic for updating this HOWTO recently.