Revision as of 09:29, 29 July 2008

Main Page - Roadmap - Summer Projects - Project Ideas - Developer FAQ - Tools - Related Projects

HOWTOs - Installation - Troubleshooting - User FAQ - Samples - Models - Education - Contributed Code - Papers

General

For an detailed description you should also read this thread: http://mailman.isi.edu/pipermail/ns-developers/2008-July/004454.html

Project Background

This approach analyses the current NS3 architecture, spot areas of parallelization and build the fundamentals algorithms to achieve performance gains! Main goal is a CPU local parallelization but an powerful architecture on the other hand should also scale in large (in a distributed environments).

The approach should be universal and transparent for all major subsystems withing the simulator. Therefore an additional abstraction layer should be introduced to hide all implementation issues and enable the possibility to disable the parallelization completely, substitute or enhance the algorithms. The additional layer is an increment, the first usable results should illume where an interface is suitable. Focus is still an working implementation!

Project Plan

Literature study
Basic parallelization and packet serialization/serialization
Synchronization approach
- Node local (CMP/SMP)
- Distributed (MPI)
Balance subsystem isolation (WHERE to split the NS3 system for parallelization)
Clean parallelization layer with the following characteristics
- as few as possible interaction with other subsystems
- minimal overhead
- new technologies should be implemented without knownledge of the underlying algorithm (e.g. interference calculation for wireless nodes)
- last but not least: the introduced algorithm should scale well for uniprocessor systems as same as TOP500.org clusters! ;)

Approach

Current approach and fundamental algorithm is based on a space parallel paradigm. Nodes are merged into subsets (called federates) where each subset represent a working thread (consider this as a thread, local process or distributed working task - for example via MPI).

Usage

Dependencies

The underling synchronization method is based on MPI. Therefore you need some additional libraries to build the parallelized ns-3. On a Debian based system you should type

aptitude install libopenmpi1 libopenmpi-dev openmpi-common openmpi-bin

to install the required dependencies.

Compile Instructions

To compile the branch (ns-3-para) you should always call all ./waf commands with a leading "CXX=/usr/bin/mpicxx". This tells waf to replace the common compiler with a MPI wrapper compiler (which itself calls the appropriate compiler). At the end you will end up with a line similar to the following to compile the branch:

./waf configure && CXX=/usr/bin/mpicxx ./waf

Parallelized Simulation

Currently there are no modification to the simulated scenario files required, except of one: you must add the line

Simulator::EnableParallelSimulation();

in front of "Simulator::Run ();". If you do not at this line the simulator behavior is similar a normal run. To start the simulation you must set up the MPI environment. Therefore you must execute the mpirun(1) command. To start the point-to-point-udp-discard scenario (bundled with ns-3) you could execute the following:

./waf --shell
mpirun --np 2 --mca btl \^udapl,openib  build/debug/examples/point-to-point-udp-discard

--np 2 means that two instances on the local machine is spawned
--mca btl \^udapl,openib signal MPI that you aren't run via low-latency bus system like infiniband and suppress some warnings.

At the end you invoke the normal program, no magic here. Thats all! At the end some wrapper scripts should be supplied and compile time environment variabled should be replaced by waf configure options.

Milestones

Synchronization between federates (Packet as well time information) - 95 %
- MPI nearly completed (lets say 95%)
- Outlook: shared memory based approach
Time synchronization - 0 %
- Is related to the question how can federates act and execute events if they does not know if a neighboring federate want to execute a event earlier in the timelime. The main challenge is to minimize the synchronization overhead to a minimum. The choice of a proper algorithm is of existential impact. This is an open question and is treated in after the data synchronization is completed.
Input/Output handling - 0 %
- Currently the output isn't synchronized if several instances of NS-3 are executed in parallel. This must be fixed! The idea is to introduce a last phase where all the simulation is done, synchronize the data (e.g. send it to the main instance - rank 0 for MPI) and output the data). This could be on answer but introduce on the other hand additional overhead (time as well data synchronization).

Profiling

Calltree Analysis

There are several profiling tools and several areas of profiling possible. These section discuss a call graph based approach via valgrind. The interpretation of the generated data is left to the reader.

First of all you need the required decencies, these include valgrind anc kcachegrind (kdelib based visualization tool). On a Debian based system you can install these via

aptitude install valgrind kcachegrind

To visualize the calltree for a particular scenario file you invoke ns-3 like this:

./waf --run tcp-large-transfer --command-template="valgrind --tool=callgrind  --trace-children=yes --collect-jumps=yes %s"
kcachegrind callgrind.out.*

References

Parallelization Approaches

omnet++ [1]
The Georgia Tech Network Simulator [2]
Dartmouth SSF [3]

Synchronization/Communication

MPI
- OpenMPI [4]
- MPI on Multicore, an OpenMP Alternative? [5]

Literature

GloMoSim: A Library for Parallel Simulation of Large-scale Wireless Networks [6]
Space-parallel network simulations using ghosts [7]
Lock-free Scheduling of Logical Processes in Parallel Simulation [8]
Learning Not to Share [9]
Towards Realistic Million-Node Internet Simulations [10]
A Generic Framework for Parallelization of Network Simulations [11]

@@ Line 74: / Line 74: @@
-= Suggestions from Mathieu Lacage =
+= Profiling =
-These were posted on the ns-developers mailing-list:
+== Calltree Analysis ==
-http://mailman.isi.edu/pipermail/ns-developers/2008-March/003829.html
+There are several profiling tools and several areas of profiling possible. These section discuss a call graph based approach via valgrind. The interpretation of the generated data is left to the reader.
+First of all you need the required decencies, these include valgrind anc kcachegrind (kdelib based visualization tool). On a Debian based system you can install these via
+<pre>
+aptitude install valgrind kcachegrind
+</pre>
+To visualize the calltree for a particular scenario file you invoke ns-3 like this:
+<pre>
+./waf --run tcp-large-transfer --command-template="valgrind --tool=callgrind  --trace-children=yes --collect-jumps=yes %s"
+kcachegrind callgrind.out.*
+</pre>
 = References =

Parallel Simulations: Difference between revisions

Revision as of 09:29, 29 July 2008

Contents

General

Project Background

Project Plan

Approach

Usage

Dependencies

Compile Instructions

Parallelized Simulation

Milestones

Profiling

Calltree Analysis

References

Parallelization Approaches

Synchronization/Communication

Literature

Navigation menu

Parallel Simulations: Difference between revisions

Revision as of 09:29, 29 July 2008

General

Project Background

Project Plan

Approach

Usage

Dependencies

Compile Instructions

Parallelized Simulation

Milestones

Profiling

Calltree Analysis

References

Parallelization Approaches

Synchronization/Communication

Literature

Navigation menu

Search