Parallel Simulations: Difference between revisions

From Nsnam
Jump to navigation Jump to search
(Mathieu's suggestions on parallel simulations)
 
Line 5: Line 5:
== Suggestions from Mathieu Lacage ==
== Suggestions from Mathieu Lacage ==


The first task is to decide what the scope of the project should be.
These were posted on the ns-developers mailing-list:
There are really two possible options:
http://mailman.isi.edu/pipermail/ns-developers/2008-March/003829.html
* provide a parallel tool for clusters, and, more generally, for
physically distributed systems
* provide a parallel tool for multicore systems (numa/smp)
 
Since I doubt very much that the same solution can be applied with
useful performance to both options, I really would urge interested
students to focus on one of the above use-cases.
 
Now, since I was asked for what what I would recommend, the following is
really a summary of one of the many implementation options you could
chose to pursue so, don't take this is as any kind of mandatory path.
This is merely a hint as to what might be implementable within the
requested timeframe and what might bring interesting and useful output
to users.
 
1) The basic idea is to implement a conservative distributed time
synchronization algorithm. The classic algorithm is described in
"Parallel simulation: parallel and distributed simulation systems" by
Richard M. Fujimoto. The fundamental assumption here is that the
simulation models make no use of global variables and that simulated
nodes are connected through channels which have a non-null minimum
propagation delay. This allows you to parallelize two events e1 and e2
generated on two simulated nodes (n1 and n2) if e2 > e1 + min_delay
(n1,n2) or if e1 > e2 + min_delay (n1,n2).
 
2) We could, of course, implement the classic algorithm in the classic
way to allow a user to build very large scale simulations by
distributing the simulated nodes on a large set of simulation machines.
This kind of parallelization is useful but it has a number of drawbacks,
the main one being that it requires the user to describe a static
partitioning of the simulated nodes: the total set of nodes is split in
a set of partitions, each of which contains a subset of the total set of
nodes. Then, each partition is scheduled to run on a single machine.
Once you have made this decision, you cannot undo it since you are not
using a language like java which allows code migration. In practice,
finding a partitioning which does not generate slowdowns is non-trivial.
 
3) So, from my perspective, a desirable solution would require that the
user does not have to describe a partition set and it would work really
hard to get automatically _some_ speedup from a multicore system.
 
A simple solution would be to re-use a classic synchronization algorithm
with a very fine-grained partitioning (i.e., each partition contains
exactly one node). Then, you could schedule each partition on one of the
cpus of the multicore machine until the synchronization algorithm tells
you that this partition has no events to schedule anymore. At this
point, you can pick another available partition to schedule it on one of
the available cpus.
 
Pluses:
* no user-visible partitioning on a single multicore machine.
* can be re-used if you have multiple multicore machines to schedule
events within each multicore machine. So, you can reuse this work if you
want to provide a more generic distributed simulation solution.
 
Minuses:
* have no idea if you will get any kind of measurable speedup and you
will need to work hard to optimize local cache thrashing, etc.
 
4) How do you implement this ?
 
The core synchronization algorithm should live in the 'simulator'
module: src/simulator/simulator.cc should basically be replaced by a new
version which performs the synchronization/partition scheduling. It is
trivial to make this code dynamically replaceable by exporting something
like SimulatorImpl defined src/simulator/simulator.cc so, I will skip
this detail.
 
The biggest problem from an API point of view is that you need to be
able to tell which partition an event belongs to. Traditionally, this is
not a problem because there is a single partition per physical
machine/address-space. However, here, we have multiple partitions within
a single address-space (physical machine). Since every partition is made
of a single Node, we just need to tell which node an event is coming
from. There are many options, but here are two:
* Simulator::ScheduleWithObject (Ptr<Object> node, Time delay,
function, ...);
* Node::Schedule (Time delay, function, ...);
 
And, then, you need to make sure that all your models use these
scheduling functions rather than the functions they are now using.
 
5) A small tricky detail to get right from an algorithmic perspective is
how you will handle events which expire at the same time. There are two
issues to deal with here:
* you need to ensure that the parallel simulations provide a
deterministic ordering of these events.
* you need to ensure that the deterministic ordering of these events
is the same in both non-parallel and parallel simulations to ensure that
models are trivially portable from a non-parallel simulation to a
parallel one.
 
I suspect that there are multiple solutions to this problem, but here is
an ordering algorithm which should solve this problem:
* all same-time events within a single node should be scheduled FIFO.
* if you have two same-time events on node A and B, and if
A->GetNodeId () < B->GetNodeId (), then, the event on A should be
scheduled first.
 
This means that you need to add to the event ordering key the nodeid as
a secondary ordering key (in fact, there are already 2 keys in there,
so, this would be a third ordering key)

Revision as of 14:56, 24 March 2008