Simulation Configuration Initial Thoughts
This is not a proposal of any kind. What follows are just some stream of consciousness thoughts on what could be done. To say that this project is in its infancy would be an understatement.
In order to configure simulations across multiple, probably virtualized, machines a large amount of configuration must be performed in order to construct the component systems. For example, boxes with appropriate software must be put together, and dozens or perhpas hundreds of taps, bridges etc. must be configured in order to construct a topology. The oppportunity for human error to creep in during this process renders it essentially manually unworkable for all but the simplest topologies.
This real-world configuration information needs to be imported into ns-3 and the various systems and devices need to be mapped to ns-3 constructs. For example, if a number of linux hosts are configured to participate in a wireless network using tap bridges, each ns-3 ghost node needs to know which host it is pretending to be and which device on that host it should be bridging to a wifi net device.
There is also a need to detect the overall state of a simulation and perfom actions based on state transitions. In the case of real-time emulation, individual simulator processes must be brought up and executed, but the simulation proper cannot start until all of the ns-3 processes are ready to process events. Currently, the only way to do this is by manually running shell scripts and printing messages on a display when the system gets into the desired state. This is also unworkable for all but the most simple topologies.
Configuration Management Database
We also want to be able to allow GUI topology builders and simulation managers to exist on top of this tool. For example, it would be the best of all worlds if, in a GUI, one could create a node graphic using a mouse click and have a virutal machine created for the user that backs that graphic. Another click could create a wireless network. A few clicks could create a wifi network device on the node graphic which would, in turn create a new device on the just-created virtual machine and a bridge into the system running the ns-3 process. The association between the tap device, the bridge and the ns-3 device would be saved and could be looked up in a global configuration management database (CMDB).
Automagic Script Generation
One could conceive of automatically generating scripts from the CMDB. For example, if a particular environment needs a wireless network, it doesn't sound horribly difficult to generate some C++ code to instantiate NodeContainers, DeviceContainers, WifiNetDevices, etc., from the configuration database and then create a real-time simulation that sits around waiting for something to happen (packets to arrive on devices). This code could be compiled and then distributed and run by a configuration management system (that is one of its primary jobs after all).
ns-3 State Detection
For state detection, there needs to be a service that lives in an ns-3 process that allows network programs to access the state of ns-3, and a global ns-3 state to be constructed. By this, I mean something like "all simulations are running," or "all simulations have passed t=10 sec." It would be nice to be able to wait for events, such as "emulation device running" on a process, or perhaps "emulation device running on nodes [4-12]" etc, etc.
It would also be nice to be able to fire local and distributed events. For example, "when console button pushed, fire an event in each simulation that starts applications."
Google MapReduce provides named counters that might be a useful thing.
Another issue is simulation results. How does one collect results from dozens or perhaps hundreds of simulations? Right now, trace files are generated or statistics gathered. Trace files are saved in the local file system. If there are hundreds of local file systems, or a shared filesystem, how does the trace file get migrated back to the user in a rational way. Can results be saved to a database of some kine? How are the data named and managed?
Existing Partial Solutions
If you think about it, many large datacenters have similar problems. There may be dozens or hundreds of computer systems that must be properly configured. System administrators must be able to make configuration changes across large numbers of computers to, for example, upgrade software. Companies also have massive parallel algorithms running and the results must be collected and returned to users.
There are a number of existing open-source programs that already address these problems. Puppet is a Ruby system configuration system. Bcfg2 uses XML and Python. There are others.
The problem of running a distributed simulation and gathering results into one place seems similar to the problem solved by Hadoop and MapReduce. The map phase could involve configuration management and simulation state management to actually create and run the simulations, and the reduce phase could be used to collect and save the resulting simulation data.
The Plan (TM)
I am currently investigating the existing solutions. I plan on installing at least one virtualization solution and seeing how well some of the open source configuration managemenet solutions can play in that space. I will try and configure some simulations of various complexity and see if I can create tap bridges, etc., and get the pieces of a reasonably sized simulation put together automagically.
- Puppet -- http://reductivelabs.com/trac/puppet/
- bcfg2 -- http://trac.mcs.anl.gov/projects/bcfg2
- Cfengine -- http://en.wikipedia.org/wiki/Cfengine
- PCfengine -- https://info.enstb.org/projets/pcfengine/en/index.html
- SmartFrog -- http://wiki.smartfrog.org/wiki/display/sf/SmartFrog+Home or
There are others ...
The next step will be to see if any of the existing solutions can be used to do any kind of state management. It would seem reasonable to expect, since someone like Amazon or Expedia might have hundreds of servers and services to be managed, they would have some kind of automated serive management function. For example, when service X comes up on machine Y, have it join the load balancing pool and begin servicing customer requests.
MapReduce implementations, for example, provide "named counters" that can be referred to globally.
Finally, I plan on playing with some of the existing general purpose distributed algorithm packages to see if they can be used to collect test results. I can picture that something like MapReduce could be used to automatically build networks of virtual machines and execute distributed simulations (Map) and then collect responses (Reduce).
- Hadoop -- http://hadoop.apache.org/mapreduce/ or http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
- MapReduce -- http://labs.google.com/papers/mapreduce-osdi04.pdf or http://en.wikipedia.org/wiki/MapReduce
- Cloudera Hadoop -- http://www.cloudera.com/
Hadoop provides potentially useful things like a distributed file system.
- Hadoop DFS -- http://developer.yahoo.com/hadoop/tutorial/
Amazon Web Services provides functions that span this whole space, from being able to define and control the instantiation of machine images to relational database services that might provide some insights.
- AWS -- http://aws.amazon.com/
- Amazon Simple Storage Service -- http://aws.amazon.com/s3/
- Amazon Elastic MapReduce -- http://aws.amazon.com/elasticmapreduce/
- Amazon Relational Database Serive -- http://aws.amazon.com/rds/
Lots of organizations with similar problems seem to be using this kind of technology.
To paraphrase Richard Feynman: "similar problems have similar solutions". It seems plausible that ns-3 could use or extend existing technology to arrange simulations on large virtual machine clusters or on large testbeds; and collect resulting statistics.
It would really be nice to be able to lean on open-source widely-used packages that already do similar things. There's just no way that the ns-3 project could hope to develop all of this stuff from scratch.
Of course every developer has his or her own idea of the perfect language, and so these pieces are coded in different languages. Puppet is a Ruby app, bcfg2 uses python, Hadoop and SmartFrog are Java (and seem to be often used together). We may have to thing about some alternate language expertise ...
It should be said, however, I really have no idea how stupid, naive or brilliant any of this is.
Craigdo 18:24, 2 November 2009 (UTC)