Difference between revisions of "Parallel Simulations"

From Nsnam
Jump to: navigation, search
(Mathieu's suggestions on parallel simulations)
 
(Suggestions from Mathieu Lacage)
Line 5: Line 5:
 
== Suggestions from Mathieu Lacage ==
 
== Suggestions from Mathieu Lacage ==
  
The first task is to decide what the scope of the project should be.
+
These were posted on the ns-developers mailing-list:
There are really two possible options:
+
http://mailman.isi.edu/pipermail/ns-developers/2008-March/003829.html
* provide a parallel tool for clusters, and, more generally, for
+
physically distributed systems
+
* provide a parallel tool for multicore systems (numa/smp)
+
 
+
Since I doubt very much that the same solution can be applied with
+
useful performance to both options, I really would urge interested
+
students to focus on one of the above use-cases.
+
 
+
Now, since I was asked for what what I would recommend, the following is
+
really a summary of one of the many implementation options you could
+
chose to pursue so, don't take this is as any kind of mandatory path.
+
This is merely a hint as to what might be implementable within the
+
requested timeframe and what might bring interesting and useful output
+
to users.
+
 
+
1) The basic idea is to implement a conservative distributed time
+
synchronization algorithm. The classic algorithm is described in
+
"Parallel simulation: parallel and distributed simulation systems" by
+
Richard M. Fujimoto. The fundamental assumption here is that the
+
simulation models make no use of global variables and that simulated
+
nodes are connected through channels which have a non-null minimum
+
propagation delay. This allows you to parallelize two events e1 and e2
+
generated on two simulated nodes (n1 and n2) if e2 > e1 + min_delay
+
(n1,n2) or if e1 > e2 + min_delay (n1,n2).
+
 
+
2) We could, of course, implement the classic algorithm in the classic
+
way to allow a user to build very large scale simulations by
+
distributing the simulated nodes on a large set of simulation machines.
+
This kind of parallelization is useful but it has a number of drawbacks,
+
the main one being that it requires the user to describe a static
+
partitioning of the simulated nodes: the total set of nodes is split in
+
a set of partitions, each of which contains a subset of the total set of
+
nodes. Then, each partition is scheduled to run on a single machine.
+
Once you have made this decision, you cannot undo it since you are not
+
using a language like java which allows code migration. In practice,
+
finding a partitioning which does not generate slowdowns is non-trivial.
+
 
+
3) So, from my perspective, a desirable solution would require that the
+
user does not have to describe a partition set and it would work really
+
hard to get automatically _some_ speedup from a multicore system.
+
 
+
A simple solution would be to re-use a classic synchronization algorithm
+
with a very fine-grained partitioning (i.e., each partition contains
+
exactly one node). Then, you could schedule each partition on one of the
+
cpus of the multicore machine until the synchronization algorithm tells
+
you that this partition has no events to schedule anymore. At this
+
point, you can pick another available partition to schedule it on one of
+
the available cpus.
+
 
+
Pluses:
+
* no user-visible partitioning on a single multicore machine.
+
* can be re-used if you have multiple multicore machines to schedule
+
events within each multicore machine. So, you can reuse this work if you
+
want to provide a more generic distributed simulation solution.
+
 
+
Minuses:
+
* have no idea if you will get any kind of measurable speedup and you
+
will need to work hard to optimize local cache thrashing, etc.
+
 
+
4) How do you implement this ?
+
 
+
The core synchronization algorithm should live in the 'simulator'
+
module: src/simulator/simulator.cc should basically be replaced by a new
+
version which performs the synchronization/partition scheduling. It is
+
trivial to make this code dynamically replaceable by exporting something
+
like SimulatorImpl defined src/simulator/simulator.cc so, I will skip
+
this detail.
+
 
+
The biggest problem from an API point of view is that you need to be
+
able to tell which partition an event belongs to. Traditionally, this is
+
not a problem because there is a single partition per physical
+
machine/address-space. However, here, we have multiple partitions within
+
a single address-space (physical machine). Since every partition is made
+
of a single Node, we just need to tell which node an event is coming
+
from. There are many options, but here are two:
+
* Simulator::ScheduleWithObject (Ptr<Object> node, Time delay,
+
function, ...);
+
* Node::Schedule (Time delay, function, ...);
+
 
+
And, then, you need to make sure that all your models use these
+
scheduling functions rather than the functions they are now using.
+
 
+
5) A small tricky detail to get right from an algorithmic perspective is
+
how you will handle events which expire at the same time. There are two
+
issues to deal with here:
+
* you need to ensure that the parallel simulations provide a
+
deterministic ordering of these events.
+
* you need to ensure that the deterministic ordering of these events
+
is the same in both non-parallel and parallel simulations to ensure that
+
models are trivially portable from a non-parallel simulation to a
+
parallel one.
+
 
+
I suspect that there are multiple solutions to this problem, but here is
+
an ordering algorithm which should solve this problem:
+
* all same-time events within a single node should be scheduled FIFO.
+
* if you have two same-time events on node A and B, and if
+
A->GetNodeId () < B->GetNodeId (), then, the event on A should be
+
scheduled first.
+
 
+
This means that you need to add to the event ordering key the nodeid as
+
a secondary ordering key (in fact, there are already 2 keys in there,
+
so, this would be a third ordering key)
+

Revision as of 14:56, 24 March 2008

Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

Link back to Suggested Projects page

Suggestions from Mathieu Lacage

These were posted on the ns-developers mailing-list: http://mailman.isi.edu/pipermail/ns-developers/2008-March/003829.html