For an detailed description you should also read this thread: http://mailman.isi.edu/pipermail/ns-developers/2008-July/004454.html
This approach analyses the current NS3 architecture, spot areas of parallelization and build the fundamentals algorithms to achieve performance gains! Main goal is a CPU local parallelization but an powerful architecture on the other hand should also scale in large (in a distributed environments).
The approach should be universal and transparent for all major subsystems withing the simulator. Therefore an additional abstraction layer should be introduced to hide all implementation issues and enable the possibility to disable the parallelization completely, substitute or enhance the algorithms. The additional layer is an increment, the first usable results should illume where an interface is suitable. Focus is still an working implementation!
- Literature study
- Basic parallelization and packet serialization/serialization
- Synchronization approach
- Node local (CMP/SMP)
- Distributed (MPI)
- Balance subsystem isolation (WHERE to split the NS3 system for parallelization)
- Clean parallelization layer with the following characteristics
- as few as possible interaction with other subsystems
- minimal overhead
- new technologies should be implemented without knownledge of the underlying algorithm (e.g. interference calculation for wireless nodes)
- last but not least: the introduced algorithm should scale well for uniprocessor systems as same as TOP500.org clusters! ;)
Current approach and fundamental algorithm is based on a space parallel paradigm. Nodes are merged into subsets (called federates) where each subset represent a working thread (consider this as a thread, local process or distributed working task - for example via MPI).
- Synchronization between federates (Packet as well time information) - 95 %
- MPI nearly completed (lets say 95%)
- Outlook: shared memory based approach
- Time synchronization - 0 %
- Is related to the question how can federates act and execute events if they does not know if a neighboring federate want to execute a event earlier in the timelime. The main challenge is to minimize the synchronization overhead to a minimum. The choice of a proper algorithm is of existential impact. This is an open question and is treated in after the data synchronization is completed.
- Input/Output handling - 0 %
- Currently the output isn't synchronized if several instances of NS-3 are executed in parallel. This must be fixed! The idea is to introduce a last phase where all the simulation is done, synchronize the data (e.g. send it to the main instance - rank 0 for MPI) and output the data). This could be on answer but introduce on the other hand additional overhead (time as well data synchronization).
Suggestions from Mathieu Lacage
These were posted on the ns-developers mailing-list: http://mailman.isi.edu/pipermail/ns-developers/2008-March/003829.html
- GloMoSim: A Library for Parallel Simulation of Large-scale Wireless Networks 
- Space-parallel network simulations using ghosts 
- Lock-free Scheduling of Logical Processes in Parallel Simulation 
- Learning Not to Share 
- Towards Realistic Million-Node Internet Simulations 
- A Generic Framework for Parallelization of Network Simulations