Difference between revisions of "SimulationConfiguration"

From Nsnam
Jump to: navigation, search
(Simulation Configuration Initial Thoughts)
Line 3: Line 3:
 
== Simulation Configuration Initial Thoughts ==
 
== Simulation Configuration Initial Thoughts ==
  
This is not a proposal of any kind.  What follows are just some stream of consciousness thoughts on what could be done.
+
This is not a proposal of any kind.  What follows are just some stream of consciousness thoughts on what could be done.  To say that this project is in its infancy would be an understatement.
  
 
=== Configuration Management ===
 
=== Configuration Management ===

Revision as of 18:33, 2 November 2009

Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

Simulation Configuration Initial Thoughts

This is not a proposal of any kind. What follows are just some stream of consciousness thoughts on what could be done. To say that this project is in its infancy would be an understatement.

Configuration Management

In order to configure simulations across multiple, probably virtualized, machines a large amount of configuration must be performed in order to construct the component systems. For example, boxes with appropriate software must be put together, and dozens or perhpas hundreds of taps, bridges etc. must be configured in order to construct a topology. The oppportunity for human error to creep in during this process renders it essentially manually unworkable for all but the simplest topologies.

This real-world configuration information needs to be imported into ns-3 and the various systems and devices need to be mapped to ns-3 constructs. For example, if a number of linux hosts are configured to participate in a wireless network using tap bridges, each ns-3 ghost node needs to know which host it is pretending to be and which device on that host it should be bridging to a wifi net device.

There is also a need to detect the overall state of a simulation and perfom actions based on state transitions. In the case of real-time emulation, individual simulator processes must be brought up and executed, but the simulation proper cannot start until all of the ns-3 processes are ready to process events. Currently, the only way to do this is by manually running shell scripts and printing messages on a display when the system gets into the desired state. This is also unworkable for all but the most simple topologies.

Configuration Management Database

We also want to be able to allow GUI topology builders and simulation managers to exist on top of this tool. For example, it would be the best of all worlds if, in a GUI, one could create a node graphic using a mouse click and have a virutal machine created for the user that backs that graphic. Another click could create a wireless network. A few clicks could create a wifi network device on the node graphic which would, in turn create a new device on the just-created virtual machine and a bridge into the system running the ns-3 process. The association between the tap device, the bridge and the ns-3 device would be saved and could be looked up in a global configuration management database (CMDB).

ns-3 State Detection

For state detection, there needs to be a service that lives in an ns-3 process that allows network programs to access the state of ns-3, and a global ns-3 state to be constructed. By this, I mean something like "all simulations are running," or "all simulations have passed t=10 sec." It would be nice to be able to wait for events, such as "emulation device running" on a process, or perhaps "emulation device running on nodes [4-12]" etc, etc.

It would also be nice to be able to fire local and distributed events. For example, "when console button pushed, fire an event in each simulation that starts applications."

Simulation Results

Another issue is simulation results. How does one collect results from dozens or perhaps hundreds of simulations? Right now, trace files are generated or statistics gathered. Trace files are saved in the local file system. If there are hundreds of local file systems, or a shared filesystem, how does the trace file get migrated back to the user in a rational way. Can results be saved to a database of some kine? How are the data named and managed?

Existing Partial Solutions

If you think about it, many large datacenters have similar problems. There may be dozens or hundreds of computer systems that must be properly configured. System administrators must be able to make configuration changes across large numbers of computers to, for example, upgrade software. Companies also have massive parallel algorithms running and the results must be collected and returned to users.

There are a number of existing open-source programs that already address these problems. Puppet is a Ruby system configuration system. Bcfg2 uses XML and Python. There are others.

The problem of running a distributed simulation and gathering results into one place seems similar to the problem solved by Hadoop and MapReduce. The map phase could involve configuration management and simulation state management to actually create and run the simulations, and the reduce phase could be used to collect and save the resulting simulation data.

The Plan (TM)

I am currently investigating the existing solutions. I plan on installing at least one virtualization solution and seeing how well some of the open source configuration managemenet solutions can play in that space. I will try and configure some simulations of various complexity and see if I can create tap bridges, etc., and get the pieces of a reasonably sized simulation put together automagically.

The next step will be to see if any of the existing solutions can be used to do any kind of state management. It would seem reasonable to expect, since someone like Amazon or Expedia might have hundreds of servers and services to be managed, they would have some kind of automated serive management function. For example, when service X comes up on machine Y, have it join the load balancing pool and begin servicing customer requests.

Finally, I plan on playing with some of the existing general purpose distributed algorithm packages to see if they can be used to collect test results. I can picture that something like MapReduce could be used to automatically build networks of virtual machines and execute distributed simulations (Map) and then collect responses (Reduce).

I have no idea how stupid, naive or brilliant any of this is yet.



Craigdo 18:24, 2 November 2009 (UTC)