Difference between revisions of "HOWTO use oprofile"

From Nsnam
Jump to: navigation, search
(caution about static builds)
(update oprofile instructions)
 
Line 5: Line 5:
 
= Background =
 
= Background =
  
here are several open source profilers, including gprof, oprofile, sysprof, and valgrind.  This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
+
There are several open source profilers, including gprof, oprofile, sysprof, and valgrind.  This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
  
 
We will be using '''operf''' and '''opreport'''; some documentation is found here:
 
We will be using '''operf''' and '''opreport'''; some documentation is found here:
Line 13: Line 13:
  
 
Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology:  http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
 
Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology:  http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
 +
 +
The [https://www.nsnam.org/docs/manual/html/profiling.html ns-3 manual] also has a section on profiling tools with ns-3.  Some of the tools have some CMake integration (but oprofile does not).
  
 
= Use =
 
= Use =
  
oprofile is available as a package on Linux distros (e.g. 'yum install oprofile').
+
oprofile is available as a package on Linux distros (e.g. 'apt install oprofile').
  
First, it is strongly recommended to build ns-3 as optimized code before profiling.
+
First, it is strongly recommended to (1) build ns-3 as a static library with optimizations, and (2) disable other examples and tests.  
  
   ./waf configure -d optimized --enable-examples --enable-static
+
   ./ns3 configure -d optimized --enable-static
  
 
If you skip the "--enable-static", you will end up profiling your main program only and not the library code.
 
If you skip the "--enable-static", you will end up profiling your main program only and not the library code.
  
'''Caution:''' You probably will want to tailor your build to avoid static linking of all examples, either by reducing the scope of the enabled modules, or by avoiding the '--enable-examples' option by using programs in the scratch directory.  A full static build of ns-3 and all examples can take up to 25GB of disk.
+
If you enable examples, you will use an enormous amount of disk space (25 GB or more) due to the static linking of all examples.
  
Previous versions of this HOWTO talked about using a program called 'opcontrol'.  This has been deprecated in favor of the 'operf' program.  We also make use of 'opreport' below.  All of these should be installed as part of installing the oprofile package.
+
Your main program should be in the `scratch/` directory (which is still compiled even if examples are disabled).
  
 
= Example usage =
 
= Example usage =
  
I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example:
+
I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example.  First, copy the program to the scratch directory and rename it:
  
   $ operf ./waf --run 'manet-routing-compare'
+
   $ cp examples/routing/manet-routing-compare.cc scratch/mrc.cc
  
You will see something like this:
+
Next, configure and build ns-3:
 +
 
 +
  $ ./ns3 clean
 +
  $ ./ns3 configure -d optimized --enable-static
 +
  $ ./ns3 build
 +
 
 +
Then run it with a program called `operf` which is part of oprofile.  Since operf requires sudo privileges, and since the `ns3` script disallows sudo, we will recurse into the build directory and run the binary directly, as follows:
 +
 
 +
  $ cd build/scratch
 +
  $ sudo operf ./ns3-dev-mrc-optimized
 +
 
 +
Note that if you are running a release of ns-3, your binary may be named differently such as `ns3.41-mrc-optimized`.
 +
 
 +
Operf will print out something like this:
  
 
   operf: Profiler started
 
   operf: Profiler started
   Waf: Entering directory `/path/to/ns-3-dev/build'
+
   Profiling done.
  
 
You may see these warnings, and can probably ignore them:
 
You may see these warnings, and can probably ignore them:
Line 53: Line 68:
 
   for the throttled event names.
 
   for the throttled event names.
  
This leaves a sampling data directory in oprofile_data/ directory location.  Next, you can use opreport to produce a report.  Suggested options are:
+
This leaves a sampling data directory in oprofile_data/ directory location.  Next, you can use `opreport` to produce a report.  Suggested options are:
  
   # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 --exclude-symbols=/usr/bin/python2.7 > opreport.out
+
   # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 > opreport.out
  
This directs the output to a file.  threshold is specified to reduce the amount of output statistics. exclude-symbols is used to remove Python from statistics calculation. Let's look at the beginning of this opreport.out file:
+
This directs the output to a file.  threshold is specified to reduce the amount of output statistics. Let's look at the beginning of this opreport.out file:
  
   $ head -20 opreport.out
+
   $ head -10 opreport.out
   Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
+
  CPU: Intel Ivy Bridge microarchitecture, speed 3900 MHz (estimated)
 +
   Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
 
   samples  %        symbol name
 
   samples  %        symbol name
   348115  19.5607 ns3::IntegralFunction(double, void*)
+
   1398      3.8657 ns3::InterferenceHelper::AppendEvent(ns3::Ptr<ns3::Event>, bool)
  78415    4.4062  ns3::aodv::RoutingTable::Purge()
+
   1173      3.2436 _int_free
   67923    3.8166 ns3::DcfManager::GetBackoffStartFor(ns3::DcfState*)
+
  1067      2.9504  pair<_Rb_tree_iterator<pair<ns3::Scheduler::EventKey const, ns3::EventImpl*>>, bool> ...
   42891    2.4101 ns3::DcfManager::UpdateBackoff()
+
   1000      2.7652 __ieee754_log_avx
   33460    1.8801 ns3::MapScheduler::Insert(ns3::Scheduler::Event const&)
+
   981      2.7126 malloc
   32708    1.8379 ns3::int64x64_t::Mul(ns3::int64x64_t const&)
+
   929      2.5689 __ieee754_pow_sse2
   30233    1.6988 ns3::Simulator::Now()
+
   906      2.5053 ns3::ConstantVelocityHelper::Update() const
  27347    1.5366  ns3::DcfManager::GetAccessGrantStart() const
+
   833      2.3034 _int_malloc
   25400    1.4272 ns3::YansWifiPhy::StartReceivePreambleAndHeader(ns3::Ptr<ns3::Packet>, double, ns3::Time)
+
   679      1.8776 ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::WifiPpdu const>, double) const
   22664    1.2735 ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::Packet const>, double, ns3::Time) const
+
 
  21769    1.2232  ns3::InterferenceHelper::CalculateNoiseInterferenceW(ns3::Ptr<ns3::InterferenceHelper::Event>, vector<ns3::InterferenceHelper::NiChange>*) const
+
This illustrates that the ns-3 methods InterferenceHelper::AppendEvent(), ConstantVelocityHelper::Update(), and YansWifiChannel::Send() were sampled most frequentlyIf you are experiencing long runtime performance problems, you may see a problematic method in the output consuming over half of the CPU clock cycles, and you can then try to debug (perhaps with logging or a debugger) why the program spends so much time there.
  21365    1.2005  ns3::DefaultSimulatorImpl::Now() const
+
 
  20968    1.1782 ns3::DcfManager::MostRecent(ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time) const
+
When you are done, you may want to remove the 'oprofile_data' directory since it is owned by root:
  20233    1.1369  ns3::WifiPhy::GetMobility()
+
  19682    1.1059  ns3::RngStream::RandU01()
+
  
The output show that IntegralFunction () and aodv::RoutingTable::Purge() are good candidates to explore whether their usage can be further optimized.
+
  $ sudo rm -rf oprofile_data
  
Thanks to Biljana Bojovic for updating this HOWTO recently.
 
  
 
= Additional references =
 
= Additional references =
  
* http://mailman.isi.edu/pipermail/ns-developers/2009-February/005290.html
+
* <s>http://mailman.isi.edu/pipermail/ns-developers/2009-February/005290.html</s> dead link at the moment
 
* https://groups.google.com/forum/?fromgroups=#!topic/ns-3-users/XkUrS_XMEUc
 
* https://groups.google.com/forum/?fromgroups=#!topic/ns-3-users/XkUrS_XMEUc

Latest revision as of 22:58, 21 March 2024

Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

This is a brief HOWTO on using oprofile to statistically sample the execution performance of an ns-3 program.

Background

There are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.

We will be using operf and opreport; some documentation is found here:

Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html

The ns-3 manual also has a section on profiling tools with ns-3. Some of the tools have some CMake integration (but oprofile does not).

Use

oprofile is available as a package on Linux distros (e.g. 'apt install oprofile').

First, it is strongly recommended to (1) build ns-3 as a static library with optimizations, and (2) disable other examples and tests.

 ./ns3 configure -d optimized --enable-static

If you skip the "--enable-static", you will end up profiling your main program only and not the library code.

If you enable examples, you will use an enormous amount of disk space (25 GB or more) due to the static linking of all examples.

Your main program should be in the `scratch/` directory (which is still compiled even if examples are disabled).

Example usage

I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example. First, copy the program to the scratch directory and rename it:

 $ cp examples/routing/manet-routing-compare.cc scratch/mrc.cc

Next, configure and build ns-3:

 $ ./ns3 clean
 $ ./ns3 configure -d optimized --enable-static
 $ ./ns3 build

Then run it with a program called `operf` which is part of oprofile. Since operf requires sudo privileges, and since the `ns3` script disallows sudo, we will recurse into the build directory and run the binary directly, as follows:

 $ cd build/scratch
 $ sudo operf ./ns3-dev-mrc-optimized

Note that if you are running a release of ns-3, your binary may be named differently such as `ns3.41-mrc-optimized`.

Operf will print out something like this:

 operf: Profiler started
 Profiling done.

You may see these warnings, and can probably ignore them:

 WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details.
 Lowering the sampling rate may reduce or eliminate lost samples.
 See the '--events' option description in the operf man page for help.

or

 WARNING! Some of the events were throttled. Throttling occurs when
 the initial sample rate is too high, causing an excessive number of
 interrupts.  Decrease the sampling frequency. Check the directory
 /path/to/oprofile_data/samples/current/stats/throttled
 for the throttled event names.

This leaves a sampling data directory in oprofile_data/ directory location. Next, you can use `opreport` to produce a report. Suggested options are:

 # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 > opreport.out

This directs the output to a file. threshold is specified to reduce the amount of output statistics. Let's look at the beginning of this opreport.out file:

 $ head -10 opreport.out
 CPU: Intel Ivy Bridge microarchitecture, speed 3900 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
 samples  %        symbol name
 1398      3.8657  ns3::InterferenceHelper::AppendEvent(ns3::Ptr<ns3::Event>, bool)
 1173      3.2436  _int_free
 1067      2.9504  pair<_Rb_tree_iterator<pair<ns3::Scheduler::EventKey const, ns3::EventImpl*>>, bool> ...
 1000      2.7652  __ieee754_log_avx
 981       2.7126  malloc
 929       2.5689  __ieee754_pow_sse2
 906       2.5053  ns3::ConstantVelocityHelper::Update() const
 833       2.3034  _int_malloc
 679       1.8776  ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::WifiPpdu const>, double) const

This illustrates that the ns-3 methods InterferenceHelper::AppendEvent(), ConstantVelocityHelper::Update(), and YansWifiChannel::Send() were sampled most frequently. If you are experiencing long runtime performance problems, you may see a problematic method in the output consuming over half of the CPU clock cycles, and you can then try to debug (perhaps with logging or a debugger) why the program spends so much time there.

When you are done, you may want to remove the 'oprofile_data' directory since it is owned by root:

 $ sudo rm -rf oprofile_data


Additional references