Difference between revisions of "HOWTO use oprofile"

From Nsnam
Jump to: navigation, search
(provide oprofile example)
(caution about static builds)
 
(2 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
= Background =
 
= Background =
  
Please read this tutorial on oprofile:  http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
+
here are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
  
There are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and therefore a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
+
We will be using '''operf''' and '''opreport'''; some documentation is found here:
 +
 
 +
* [http://oprofile.sourceforge.net/doc/overview.html#getting-started-with-operf operf]
 +
* [http://oprofile.sourceforge.net/doc/opreport.html opreport]
 +
 
 +
Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology:  http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
  
 
= Use =
 
= Use =
Line 17: Line 22:
 
   ./waf configure -d optimized --enable-examples --enable-static
 
   ./waf configure -d optimized --enable-examples --enable-static
  
Second, you will need to run as root.  Third you will want to run the program outside of waf, which may require you to set the LD_LIBRARY_PATH variable to find the ns-3 libraries.
+
If you skip the "--enable-static", you will end up profiling your main program only and not the library code.
  
  # opcontrol --deinit
+
'''Caution:''' You probably will want to tailor your build to avoid static linking of all examples, either by reducing the scope of the enabled modules, or by avoiding the '--enable-examples' option by using programs in the scratch directory. A full static build of ns-3 and all examples can take up to 25GB of disk.
  # opcontrol --no-vmlinux
+
  # opcontrol --separate=kernel
+
  # opcontrol --init
+
  # opcontrol --reset
+
  # opcontrol --start
+
  # ./test-program
+
  # opcontrol --stop
+
  # opreport -l ./test-program
+
  # opcontrol --deinit
+
  
 +
Previous versions of this HOWTO talked about using a program called 'opcontrol'.  This has been deprecated in favor of the 'operf' program.  We also make use of 'opreport' below.  All of these should be installed as part of installing the oprofile package.
  
Note:  You may need to stop the NMI watchdog to free up counter 0 prior to starting. 
+
= Example usage =
If you receive an error on opcontrol --start, try this:
+
  
  echo 0 > /proc/sys/kernel/nmi_watchdog
+
I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example:
  
For more information:
+
  $ operf ./waf --run 'manet-routing-compare'
http://smackerelofopinion.blogspot.com/2013/05/getting-started-with-oprofile-on-ubuntu.html
+
  
= Example =
+
You will see something like this:
  
This is a recent run of the example program ''examples/routing/manet-routing-compare.cc'' on a 64-bit Fedora 20 Linux machine.  I had first used 'yum' to install oprofile.
+
  operf: Profiler started
 +
  Waf: Entering directory `/path/to/ns-3-dev/build'
  
First, as non-privileged user, I built ns-3:
+
You may see these warnings, and can probably ignore them:
  
   $ ./waf configure -d optimized --enable-static --enable-examples --enable-tests && ./waf build
+
   WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details.
 +
  Lowering the sampling rate may reduce or eliminate lost samples.
 +
  See the '--events' option description in the operf man page for help.
  
Next, I sudo'ed to root shell and did the following.  First, I added to my LD_LIBRARY_PATH environment variable the location of the ns-3 libraries, which are in the 'build/' directory:
+
or
+
  # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:build/
+
  
Next, I stepped through the commands above:
+
  WARNING! Some of the events were throttled. Throttling occurs when
 +
  the initial sample rate is too high, causing an excessive number of
 +
  interrupts.  Decrease the sampling frequency. Check the directory
 +
  /path/to/oprofile_data/samples/current/stats/throttled
 +
  for the throttled event names.
  
  # opcontrol --deinit
+
This leaves a sampling data directory in oprofile_data/ directory location.  Next, you can use opreport to produce a report.  Suggested options are:
  # opcontrol --no-vmlinux
+
  # opcontrol --separate=kernel
+
  # opcontrol --init
+
  # opcontrol --reset
+
  # opcontrol --start
+
  
At this point, I received an error: 
+
   # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 --exclude-symbols=/usr/bin/python2.7 > opreport.out
+
   ...
+
  Error: counter 0 not available nmi_watchdog using this resource
+
  
I needed to disable this watchdog as stated above:
+
This directs the output to a file.  threshold is specified to reduce the amount of output statistics. exclude-symbols is used to remove Python from statistics calculation. Let's look at the beginning of this opreport.out file:
  
   # echo 0 > /proc/sys/kernel/nmi_watchdog
+
   $ head -20 opreport.out
 +
  Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
 +
  samples  %        symbol name
 +
  348115  19.5607  ns3::IntegralFunction(double, void*)
 +
  78415    4.4062  ns3::aodv::RoutingTable::Purge()
 +
  67923    3.8166  ns3::DcfManager::GetBackoffStartFor(ns3::DcfState*)
 +
  42891    2.4101  ns3::DcfManager::UpdateBackoff()
 +
  33460    1.8801  ns3::MapScheduler::Insert(ns3::Scheduler::Event const&)
 +
  32708    1.8379  ns3::int64x64_t::Mul(ns3::int64x64_t const&)
 +
  30233    1.6988  ns3::Simulator::Now()
 +
  27347    1.5366  ns3::DcfManager::GetAccessGrantStart() const
 +
  25400    1.4272  ns3::YansWifiPhy::StartReceivePreambleAndHeader(ns3::Ptr<ns3::Packet>, double, ns3::Time)
 +
  22664    1.2735  ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::Packet const>, double, ns3::Time) const
 +
  21769    1.2232  ns3::InterferenceHelper::CalculateNoiseInterferenceW(ns3::Ptr<ns3::InterferenceHelper::Event>, vector<ns3::InterferenceHelper::NiChange>*) const
 +
  21365    1.2005  ns3::DefaultSimulatorImpl::Now() const
 +
  20968    1.1782  ns3::DcfManager::MostRecent(ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time) const
 +
  20233    1.1369  ns3::WifiPhy::GetMobility()
 +
  19682    1.1059  ns3::RngStream::RandU01()
  
Then I ran deinit and started over.
+
The output show that IntegralFunction () and aodv::RoutingTable::Purge() are good candidates to explore whether their usage can be further optimized.
 
+
  # opcontrol --deinit
+
  # opcontrol --no-vmlinux
+
  # opcontrol --separate=kernel
+
  # opcontrol --init
+
  # opcontrol --reset
+
  # opcontrol --start
+
 
+
At this point, I can run programs.  The program that I want is found in 'build/examples/routing/ns3-dev-manet-routing-compare-optimized'.
+
 
+
  # cd build/examples/routing
+
  # ./ns3-dev-manet-routing-compare-optimized
+
 
+
Once the program finished, I did:
+
 
+
  # opcontrol --stop
+
  # opreport -l ./ns3-dev-manet-routing-compare-optimized > /tmp/opreport.out
+
  # opcontrol --deinit
+
 
+
This left a copy of my opreport in a temporary file.  Let's look at the beginning of this file:
+
 
+
 
+
  # head -20 /tmp/opreport.txt
+
  CPU: Core 2, speed 2.4e+06 MHz (estimated)
+
  Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
+
  samples  %        image name              symbol name
+
  43062    3.8673  libm-2.18.so            __ieee754_pow_sse2
+
  40383    3.6267  libc-2.18.so            _int_malloc
+
  33561    3.0140  ns3-dev-manet-routing-compare-optimized ns3::MapScheduler::Insert(ns3::Scheduler::Event const&)
+
  31310    2.8119  libm-2.18.so            __ieee754_log_sse2
+
  31284    2.8095  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::UpdateBackoff()
+
  30043    2.6981  ns3-dev-manet-routing-compare-optimized ns3::int64x64_t::Mul(ns3::int64x64_t const&)
+
  29855    2.6812  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::MostRecent(ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time) const
+
  29347    2.6356  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::GetAccessGrantStart() const
+
  28877    2.5934  libc-2.18.so            _int_free
+
  25012    2.2463  libstdc++.so.6.0.19      std::_Rb_tree_increment(std::_Rb_tree_node_base*)
+
  23880    2.1446  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::GetBackoffStartFor(ns3::DcfState*)
+
  21907    1.9674  ns3-dev-manet-routing-compare-optimized ns3::aodv::RoutingTable::Purge()
+
  20875    1.8747  ns3-dev-manet-routing-compare-optimized ns3::Ptr<ns3::Packet>::~Ptr()
+
  20265    1.8200  ns3-dev-manet-routing-compare-optimized ns3::DefaultSimulatorImpl::Now() const
+
  18734    1.6825  ns3-dev-manet-routing-compare-optimized ns3::InterferenceHelper::CalculatePer(ns3::Ptr<ns3::InterferenceHelper::Event const>, std::vector<ns3::InterferenceHelper::NiChange, std::allocator<ns3::InterferenceHelper::NiChange> >*) const
+
  18368    1.6496  libc-2.18.so            malloc
+
  17441    1.5663  ns3-dev-manet-routing-compare-optimized ns3::ConstantVelocityHelper::Update() const
+
  
Besides the event scheduling (the top-most ns-3 function listed above), the program spends a lot of time in the Wifi DCF code, purging the AODV routing table, in Wifi interference calculations, in mobility updates, etc...
+
Thanks to Biljana Bojovic for updating this HOWTO recently.
  
 
= Additional references =
 
= Additional references =

Latest revision as of 18:28, 28 April 2017

Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

This is a brief HOWTO on using oprofile to statistically sample the execution performance of an ns-3 program.

Background

here are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.

We will be using operf and opreport; some documentation is found here:

Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html

Use

oprofile is available as a package on Linux distros (e.g. 'yum install oprofile').

First, it is strongly recommended to build ns-3 as optimized code before profiling.

 ./waf configure -d optimized --enable-examples --enable-static

If you skip the "--enable-static", you will end up profiling your main program only and not the library code.

Caution: You probably will want to tailor your build to avoid static linking of all examples, either by reducing the scope of the enabled modules, or by avoiding the '--enable-examples' option by using programs in the scratch directory. A full static build of ns-3 and all examples can take up to 25GB of disk.

Previous versions of this HOWTO talked about using a program called 'opcontrol'. This has been deprecated in favor of the 'operf' program. We also make use of 'opreport' below. All of these should be installed as part of installing the oprofile package.

Example usage

I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example:

 $ operf ./waf --run 'manet-routing-compare'

You will see something like this:

 operf: Profiler started
 Waf: Entering directory `/path/to/ns-3-dev/build'

You may see these warnings, and can probably ignore them:

 WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details.
 Lowering the sampling rate may reduce or eliminate lost samples.
 See the '--events' option description in the operf man page for help.

or

 WARNING! Some of the events were throttled. Throttling occurs when
 the initial sample rate is too high, causing an excessive number of
 interrupts.  Decrease the sampling frequency. Check the directory
 /path/to/oprofile_data/samples/current/stats/throttled
 for the throttled event names.

This leaves a sampling data directory in oprofile_data/ directory location. Next, you can use opreport to produce a report. Suggested options are:

 # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 --exclude-symbols=/usr/bin/python2.7 > opreport.out

This directs the output to a file. threshold is specified to reduce the amount of output statistics. exclude-symbols is used to remove Python from statistics calculation. Let's look at the beginning of this opreport.out file:

 $ head -20 opreport.out
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
 samples  %        symbol name
 348115   19.5607  ns3::IntegralFunction(double, void*)
 78415     4.4062  ns3::aodv::RoutingTable::Purge()
 67923     3.8166  ns3::DcfManager::GetBackoffStartFor(ns3::DcfState*)
 42891     2.4101  ns3::DcfManager::UpdateBackoff()
 33460     1.8801  ns3::MapScheduler::Insert(ns3::Scheduler::Event const&)
 32708     1.8379  ns3::int64x64_t::Mul(ns3::int64x64_t const&)
 30233     1.6988  ns3::Simulator::Now()
 27347     1.5366  ns3::DcfManager::GetAccessGrantStart() const
 25400     1.4272  ns3::YansWifiPhy::StartReceivePreambleAndHeader(ns3::Ptr<ns3::Packet>, double, ns3::Time)
 22664     1.2735  ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::Packet const>, double, ns3::Time) const
 21769     1.2232  ns3::InterferenceHelper::CalculateNoiseInterferenceW(ns3::Ptr<ns3::InterferenceHelper::Event>, vector<ns3::InterferenceHelper::NiChange>*) const
 21365     1.2005  ns3::DefaultSimulatorImpl::Now() const
 20968     1.1782  ns3::DcfManager::MostRecent(ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time) const
 20233     1.1369  ns3::WifiPhy::GetMobility()
 19682     1.1059  ns3::RngStream::RandU01()

The output show that IntegralFunction () and aodv::RoutingTable::Purge() are good candidates to explore whether their usage can be further optimized.

Thanks to Biljana Bojovic for updating this HOWTO recently.

Additional references