HOWTO use oprofile: Difference between revisions

From Nsnam
Jump to navigation Jump to search
(provide oprofile example)
(update oprofile instructions)
 
(3 intermediate revisions by the same user not shown)
Line 5: Line 5:
= Background =
= Background =


Please read this tutorial on oprofile:  http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
There are several open source profilers, including gprof, oprofile, sysprof, and valgrind.  This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.


There are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and therefore a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.
We will be using '''operf''' and '''opreport'''; some documentation is found here:
 
* [http://oprofile.sourceforge.net/doc/overview.html#getting-started-with-operf operf]
* [http://oprofile.sourceforge.net/doc/opreport.html opreport]
 
Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology:  http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html
 
The [https://www.nsnam.org/docs/manual/html/profiling.html ns-3 manual] also has a section on profiling tools with ns-3.  Some of the tools have some CMake integration (but oprofile does not).


= Use =
= Use =


oprofile is available as a package on Linux distros (e.g. 'yum install oprofile').
oprofile is available as a package on Linux distros (e.g. 'apt install oprofile').


First, it is strongly recommended to build ns-3 as optimized code before profiling.
First, it is strongly recommended to (1) build ns-3 as a static library with optimizations, and (2) disable other examples and tests.  


   ./waf configure -d optimized --enable-examples --enable-static
   ./ns3 configure -d optimized --enable-static


Second, you will need to run as root.  Third you will want to run the program outside of waf, which may require you to set the LD_LIBRARY_PATH variable to find the ns-3 libraries.
If you skip the "--enable-static", you will end up profiling your main program only and not the library code.


  # opcontrol --deinit
If you enable examples, you will use an enormous amount of disk space (25 GB or more) due to the static linking of all examples.
  # opcontrol --no-vmlinux
  # opcontrol --separate=kernel
  # opcontrol --init
  # opcontrol --reset
  # opcontrol --start
  # ./test-program
  # opcontrol --stop
  # opreport -l ./test-program
  # opcontrol --deinit


Your main program should be in the `scratch/` directory (which is still compiled even if examples are disabled).


Note:  You may need to stop the NMI watchdog to free up counter 0 prior to starting. 
= Example usage =
If you receive an error on opcontrol --start, try this:


  echo 0 > /proc/sys/kernel/nmi_watchdog
I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example.  First, copy the program to the scratch directory and rename it:


For more information:
  $ cp examples/routing/manet-routing-compare.cc scratch/mrc.cc
http://smackerelofopinion.blogspot.com/2013/05/getting-started-with-oprofile-on-ubuntu.html


= Example =
Next, configure and build ns-3:


This is a recent run of the example program ''examples/routing/manet-routing-compare.cc'' on a 64-bit Fedora 20 Linux machine.  I had first used 'yum' to install oprofile.
  $ ./ns3 clean
  $ ./ns3 configure -d optimized --enable-static
  $ ./ns3 build


First, as non-privileged user, I built ns-3:
Then run it with a program called `operf` which is part of oprofile.  Since operf requires sudo privileges, and since the `ns3` script disallows sudo, we will recurse into the build directory and run the binary directly, as follows:


   $ ./waf configure -d optimized --enable-static --enable-examples --enable-tests && ./waf build
   $ cd build/scratch
  $ sudo operf ./ns3-dev-mrc-optimized


Next, I sudo'ed to root shell and did the following.  First, I added to my LD_LIBRARY_PATH environment variable the location of the ns-3 libraries, which are in the 'build/' directory:
Note that if you are running a release of ns-3, your binary may be named differently such as `ns3.41-mrc-optimized`.
  # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:build/


Next, I stepped through the commands above:
Operf will print out something like this:


   # opcontrol --deinit
   operf: Profiler started
   # opcontrol --no-vmlinux
   Profiling done.
  # opcontrol --separate=kernel
  # opcontrol --init
  # opcontrol --reset
  # opcontrol --start


At this point, I received an error:
You may see these warnings, and can probably ignore them:
  ...
  Error: counter 0 not available nmi_watchdog using this resource


I needed to disable this watchdog as stated above:
  WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details.
  Lowering the sampling rate may reduce or eliminate lost samples.
  See the '--events' option description in the operf man page for help.


  # echo 0 > /proc/sys/kernel/nmi_watchdog
or


Then I ran deinit and started over.
  WARNING! Some of the events were throttled. Throttling occurs when
  the initial sample rate is too high, causing an excessive number of
  interrupts.  Decrease the sampling frequency. Check the directory
  /path/to/oprofile_data/samples/current/stats/throttled
  for the throttled event names.


  # opcontrol --deinit
This leaves a sampling data directory in oprofile_data/ directory location.  Next, you can use `opreport` to produce a report.  Suggested options are:
  # opcontrol --no-vmlinux
  # opcontrol --separate=kernel
  # opcontrol --init
  # opcontrol --reset
  # opcontrol --start


At this point, I can run programs.  The program that I want is found in 'build/examples/routing/ns3-dev-manet-routing-compare-optimized'.
  # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 > opreport.out


  # cd build/examples/routing
This directs the output to a file. threshold is specified to reduce the amount of output statistics. Let's look at the beginning of this opreport.out file:
  # ./ns3-dev-manet-routing-compare-optimized


Once the program finished, I did:
  $ head -10 opreport.out
    
  CPU: Intel Ivy Bridge microarchitecture, speed 3900 MHz (estimated)
   # opcontrol --stop
  Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
   # opreport -l ./ns3-dev-manet-routing-compare-optimized > /tmp/opreport.out
  samples  %        symbol name
   # opcontrol --deinit
  1398      3.8657  ns3::InterferenceHelper::AppendEvent(ns3::Ptr<ns3::Event>, bool)
  1173      3.2436  _int_free
  1067      2.9504  pair<_Rb_tree_iterator<pair<ns3::Scheduler::EventKey const, ns3::EventImpl*>>, bool> ...
  1000      2.7652  __ieee754_log_avx
   981      2.7126  malloc
   929      2.5689  __ieee754_pow_sse2
   906      2.5053  ns3::ConstantVelocityHelper::Update() const
  833      2.3034  _int_malloc
   679      1.8776  ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::WifiPpdu const>, double) const


This left a copy of my opreport in a temporary file. Let's look at the beginning of this file:
This illustrates that the ns-3 methods InterferenceHelper::AppendEvent(), ConstantVelocityHelper::Update(), and YansWifiChannel::Send() were sampled most frequently.  If you are experiencing long runtime performance problems, you may see a problematic method in the output consuming over half of the CPU clock cycles, and you can then try to debug (perhaps with logging or a debugger) why the program spends so much time there.


When you are done, you may want to remove the 'oprofile_data' directory since it is owned by root:


   # head -20 /tmp/opreport.txt
   $ sudo rm -rf oprofile_data
  CPU: Core 2, speed 2.4e+06 MHz (estimated)
  Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
  samples  %        image name              symbol name
  43062    3.8673  libm-2.18.so            __ieee754_pow_sse2
  40383    3.6267  libc-2.18.so            _int_malloc
  33561    3.0140  ns3-dev-manet-routing-compare-optimized ns3::MapScheduler::Insert(ns3::Scheduler::Event const&)
  31310    2.8119  libm-2.18.so            __ieee754_log_sse2
  31284    2.8095  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::UpdateBackoff()
  30043    2.6981  ns3-dev-manet-routing-compare-optimized ns3::int64x64_t::Mul(ns3::int64x64_t const&)
  29855    2.6812  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::MostRecent(ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time, ns3::Time) const
  29347    2.6356  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::GetAccessGrantStart() const
  28877    2.5934  libc-2.18.so            _int_free
  25012    2.2463  libstdc++.so.6.0.19      std::_Rb_tree_increment(std::_Rb_tree_node_base*)
  23880    2.1446  ns3-dev-manet-routing-compare-optimized ns3::DcfManager::GetBackoffStartFor(ns3::DcfState*)
  21907    1.9674  ns3-dev-manet-routing-compare-optimized ns3::aodv::RoutingTable::Purge()
  20875    1.8747  ns3-dev-manet-routing-compare-optimized ns3::Ptr<ns3::Packet>::~Ptr()
  20265    1.8200  ns3-dev-manet-routing-compare-optimized ns3::DefaultSimulatorImpl::Now() const
  18734    1.6825  ns3-dev-manet-routing-compare-optimized ns3::InterferenceHelper::CalculatePer(ns3::Ptr<ns3::InterferenceHelper::Event const>, std::vector<ns3::InterferenceHelper::NiChange, std::allocator<ns3::InterferenceHelper::NiChange> >*) const
  18368    1.6496  libc-2.18.so            malloc
  17441    1.5663  ns3-dev-manet-routing-compare-optimized ns3::ConstantVelocityHelper::Update() const


Besides the event scheduling (the top-most ns-3 function listed above), the program spends a lot of time in the Wifi DCF code, purging the AODV routing table, in Wifi interference calculations, in mobility updates, etc...


= Additional references =
= Additional references =


* http://mailman.isi.edu/pipermail/ns-developers/2009-February/005290.html
* <s>http://mailman.isi.edu/pipermail/ns-developers/2009-February/005290.html</s> dead link at the moment
* https://groups.google.com/forum/?fromgroups=#!topic/ns-3-users/XkUrS_XMEUc
* https://groups.google.com/forum/?fromgroups=#!topic/ns-3-users/XkUrS_XMEUc

Latest revision as of 22:58, 21 March 2024

Main Page - Roadmap - Summer Projects - Project Ideas - Developer FAQ - Tools - Related Projects

HOWTOs - Installation - Troubleshooting - User FAQ - Samples - Models - Education - Contributed Code - Papers

This is a brief HOWTO on using oprofile to statistically sample the execution performance of an ns-3 program.

Background

There are several open source profilers, including gprof, oprofile, sysprof, and valgrind. This HOWTO focuses on oprofile, which is a good tool for ns-3 because ns-3 programs are logic-heavy with lots of small functions and templates, and a statistical profiler such as oprofile is more relevant than a profile that counts instructions such as gprof or valgrind.

We will be using operf and opreport; some documentation is found here:

Please read this tutorial on oprofile using opcontrol; it describes a bit behind the testing methodology: http://ssvb.github.com/2011/08/23/yet-another-oprofile-tutorial.html

The ns-3 manual also has a section on profiling tools with ns-3. Some of the tools have some CMake integration (but oprofile does not).

Use

oprofile is available as a package on Linux distros (e.g. 'apt install oprofile').

First, it is strongly recommended to (1) build ns-3 as a static library with optimizations, and (2) disable other examples and tests.

 ./ns3 configure -d optimized --enable-static

If you skip the "--enable-static", you will end up profiling your main program only and not the library code.

If you enable examples, you will use an enormous amount of disk space (25 GB or more) due to the static linking of all examples.

Your main program should be in the `scratch/` directory (which is still compiled even if examples are disabled).

Example usage

I'll use the program 'examples/wireless/manet-routing-compare.cc' as an example. First, copy the program to the scratch directory and rename it:

 $ cp examples/routing/manet-routing-compare.cc scratch/mrc.cc

Next, configure and build ns-3:

 $ ./ns3 clean
 $ ./ns3 configure -d optimized --enable-static
 $ ./ns3 build

Then run it with a program called `operf` which is part of oprofile. Since operf requires sudo privileges, and since the `ns3` script disallows sudo, we will recurse into the build directory and run the binary directly, as follows:

 $ cd build/scratch
 $ sudo operf ./ns3-dev-mrc-optimized

Note that if you are running a release of ns-3, your binary may be named differently such as `ns3.41-mrc-optimized`.

Operf will print out something like this:

 operf: Profiler started
 Profiling done.

You may see these warnings, and can probably ignore them:

 WARNING: Lost samples detected! See /path/to/ns-3-dev/oprofile_data/samples/operf.log for details.
 Lowering the sampling rate may reduce or eliminate lost samples.
 See the '--events' option description in the operf man page for help.

or

 WARNING! Some of the events were throttled. Throttling occurs when
 the initial sample rate is too high, causing an excessive number of
 interrupts.  Decrease the sampling frequency. Check the directory
 /path/to/oprofile_data/samples/current/stats/throttled
 for the throttled event names.

This leaves a sampling data directory in oprofile_data/ directory location. Next, you can use `opreport` to produce a report. Suggested options are:

 # opreport --exclude-dependent --demangle=smart --symbols --threshold=1 > opreport.out

This directs the output to a file. threshold is specified to reduce the amount of output statistics. Let's look at the beginning of this opreport.out file:

 $ head -10 opreport.out
 CPU: Intel Ivy Bridge microarchitecture, speed 3900 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
 samples  %        symbol name
 1398      3.8657  ns3::InterferenceHelper::AppendEvent(ns3::Ptr<ns3::Event>, bool)
 1173      3.2436  _int_free
 1067      2.9504  pair<_Rb_tree_iterator<pair<ns3::Scheduler::EventKey const, ns3::EventImpl*>>, bool> ...
 1000      2.7652  __ieee754_log_avx
 981       2.7126  malloc
 929       2.5689  __ieee754_pow_sse2
 906       2.5053  ns3::ConstantVelocityHelper::Update() const
 833       2.3034  _int_malloc
 679       1.8776  ns3::YansWifiChannel::Send(ns3::Ptr<ns3::YansWifiPhy>, ns3::Ptr<ns3::WifiPpdu const>, double) const

This illustrates that the ns-3 methods InterferenceHelper::AppendEvent(), ConstantVelocityHelper::Update(), and YansWifiChannel::Send() were sampled most frequently. If you are experiencing long runtime performance problems, you may see a problematic method in the output consuming over half of the CPU clock cycles, and you can then try to debug (perhaps with logging or a debugger) why the program spends so much time there.

When you are done, you may want to remove the 'oprofile_data' directory since it is owned by root:

 $ sudo rm -rf oprofile_data


Additional references