Statistical Framework for Network Simulation
This page outlines work on simulation data collection and a statistical framework for ns-3.
Primary objectives for this effort are the following:
- Provide functionality to record, calculate, and present data and statistics for analysis of network simulations.
- Boost simulation performance by reducing the need to generate extensive trace logs in order to collect data.
- Enable simulation control via online statistics, e.g. terminating simulations or repeating trials.
Derived sub-goals and other target features include the following:
- Integration with the existing ns-3 tracing system as the basic instrumentation framework of the internal simulation engine, e.g. network stacks, net devices, and channels.
- Enabling users to utilize the statistics framework without requiring use of the tracing system.
- Helping users create, aggregate, and analyze data over multiple trials.
- Support for user created instrumentation, e.g. of application specific events and measures.
- Low memory and CPU overhead when the package is not in use.
- Leveraging existing analysis and output tools as much as possible. The framework may provide some basic statistics, but the focus is on collecting data and making it accessible for manipulation in established tools.
- Eventual extension to assist in distributing independent replications is also important, though not included in the first round of features.
- 2008/05/30---A draft framework has been made available at . It includes the following features:
- Basic data collection framework, including metadata about the run being conducted.
- Two minimal data collectors: A counter, and a min/max/avg/total observer.
- Extensions of those to easily work with times and packets.
- Plaintext output formatted for omnetpp.
- Database output using sqlite3, a standalone, lightweight, high performance SQL engine.
- An example based on the notional experiment of examining the properties of NS-3's default ad hoc WiFi performance. It incorporates the following:
- Constructs of a two node ad hoc WiFi network, with the nodes a parameterized distance apart.
- UDP traffic source and sink applications with slightly different behavior and measurement hooks than the stock classes.
- Data collection from the NS-3 core via existing trace signals, in particular data on frames transmitted and received by the WiFi MAC objects.
- Instrumentation of custom applications by connecting new trace signals to the stat framework, as well as via direct updates. Information is recorded about total packets sent and received, bytes transmitted, and end-to-end delay.
- An example of using packet tags to track end-to-end delay.
- A simple control script which runs a number of trials of the experiment at varying distances and queries the resulting database to produce a graph using GNUPlot.
The framework is based around the following core principles:
- One experiment trial is conducted by one instance of a simulation program.
- They may run in parallel or serially, but the simulation is generally not being "rebooted" inside one program or simply run longer.
- A control script executes instances of the simulation, varying parameters as necessary.
- Data is collected and stored for plotting and analysis using external scripts and existing tools.
- Measures within the ns-3 core are taken by connecting the stat framework to existing trace signals.
- Trace signals or direct manipulation of the framework may be used to instrument custom simulation code.
Those basic components of the framework and their interactions are depicted in the following figure.
Several components and packages have been made for ns-2 to collect and manage data and statistics. A variety of these are listed in the ns-2 wiki. The following are notes on particular efforts.
- ns2measure provides a data collection framework for ns-2 and support for calculating statistics over that data, including multiple runs. The main component is a global observer object incorporated into ns-2. Several generic types of measures are supported, e.g. time averaged and discrete rate. Observed samples are recorded via an explicit call to the observer object, identified by a measure label and particular identifier such as a flow or host. Post-simulation scripts provide for analyzing collected data and generating statistics. A control script is provided such that runs may be repeated until a statistical goal such as a confidence level is met. Data from independent runs may be incorporated in generation of the statistics.
- simd executes distributed, independent simulation runs and collects data from them. A set of python scripts is used to push simulations out to client nodes, with a standardized set of scripts used to parameterize runs. Scripts are expected to produce output as comma seperated values, which are collected and concatenated by the master control script.
- ns-2/akaroa-2 provides support for executing distributed, independent replications, with significant statistical support for working with collected data and managing the runs. A master program runs on one computer and a set of clients on other machines that execute received simulations. Within each ns-2 instance a global observer is created. Samples are reported to that observer, which forwards them to the master computer. Measures are identified in simulation scripts by numeric identifiers and consist of particular observations, e.g. delay or packet size. The master program receives these observations and calculates statistics such as the mean and confidence interval over them. That data is used both for final output, and to conduct more simulations at the client machines if confidence is low. Another addition to ns-2 is incorporation of a different random number generator with better guarantees for independent streams.
- tracegraph is a Matlab based package for producing a wide variety of plots from trace files.
- rpi ns2graph provides several observation objects, some for generating traces to be used in graphing, others producing only summary statistics. A number of classes are provided for collecting data on common network statistics, such as round trip time. An API is also given for controlling graph output to a variety of tools, such as GNUPLOT.
- ns2 jtrana parses an ns-2 trace file into MySQL, and provides an interface to interrogate the database and produce graphs and other output in several formats. The DB scheme is largely a straightforward encoding of trace data.
- Samer Bali's scripts generate statistics from traces, including averages over multiple runs, for a number of measures.
The following table charts five features of these packages:
- Run Mgmt: Whether or not the package provides support for conducting multiple trials and varying parameters.
- Data Mgmt: Whether or not the package helps manage data generated from multiple trials.
- Replicaton: Whether or not the packages supports distributing trials across multiple hosts.
- Trace Analysis: Whether or not the package supports producing statistics from recorded trace logs.
- Runtime Obsv: Whether or not the package provides hooks to observe data and generate statistics during a trial.
|Package||Run Mgmt||Data Mgmt||Replication||Trace Analysis||Runtime Obsv|