Data Collection Framework

From Nsnam
Revision as of 14:33, 27 July 2013 by Tomh (Talk | contribs) (dcf use cases)

Jump to: navigation, search

Scope

Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage

Arch and API Proposal

Simulation data collection process is a three-stage pipeline (Felipe, I am sorry for badly misusing your terms):

1. Models report events using existing trace source mechanism.

2. A number of probes are set up to listen interesting events. Probes are responsible for on-line data normalization and reduction. Probes report reduced data using trace source mechanism, this allows user to organize probes into trees.

3. One or more collectors are set up to listen interesting probes. Collectors are responsible to store data in some meaningful format.

The following design restrictions are suggested to follow:

1. Probes should not generate data by themselves. This restriction allow probes to be reusable.

2. Collectors should not change data they receive from probes. This restriction allow collectors to be reusable.

Proof of concept implementation of these architecture is located here: http://codereview.appspot.com/3105042 and explained in some detail below.


Events & Probes

Events can be of any type (= trace source signature), actually every trace source can be considered as an event for data collection framework.

To be visible to the data collection framework event must be listened by some probe object. This is a probe responsibility to match event signature and produce some meaningful output. The simplest probe can listen for (void) events and count them.

Probes report their output data streams as trace sources, e.g. probe reporting {time, double} time series can have an "Output" trace source with (Time, double) signature. This allow user to create probe trees. Note that while number of probe inputs is not restricted here, it is recommended to have single probe output for all probes.

The following probe hierarchy is implemented in the proof-of-concept example:

- Probe 
 - DoubleProbe
  - CountProbe
  - DivProbe
 - StringProbe
  - AttributeProbe

It is assumed that there will be created large number of "basic" probes, implementing all simple data reduction/normalization operations. Users can add their own probes as well.

Collectors

Collectors are responsible to listen probes and store data to some user-meaningful format. In current implementation collector also owns all registered probes (i.e. controls their lifetime). Only supported probe types can be registered on the collector, because collector must know how to store probe output data. This is the only place in the data collection pipeline, where data type is restricted.

CsvCollector in the proof-of-concept example supports string and double time series (StringProbe and DoubleProbe and their subclasses) and stores every registered probe as single CSV file. Many more useful collectors can be created.

There can be more than one collector in the running application. I can imagine one collector saving output data series, another one deciding for transient detection and run termination and the last one saving mobility/topology events in the visualization-compatible format.

Informal requirements

This is a place to collect functional requirements, examples of intended use and expected behavior for the data collection framework.

Null

I don't want to collect any data from my simulation. In this case I expect that I don't need to use (and even known about) the data collection framework. Also I expect no notable performance degradation of my legacy ns-3 applications.

Global data collection window

I want to setup data collection start and stop times. All events before start and after stop will be ignored by this framework.

Local data collection window

I want to be able to individually change data collection window for every enabled data source.

Local scalar counter

I have some interesting event in my model and I have a number of model instances. I want data collection framework to count the number of events and at any time in simulation access per-instance values of the counter for some (m.b. all) model instances. I expect that to do this I will need a single line of code for every model instance being monitored or one line to enable data collection from all already created instances. I expect that data collection configuration will give me some "handles" to address counter values for individual model instances. I'd like to have simple way to apply avg, sum, min, max, var, ... functions to the vector of these "handles", as well as save all values to the output database.

Global scalar counter

As above, but I just want to access the sum of counter value over all model instances at the end of simulation. I expect that to do this I need to add single line of code to configure data collection and one more line to configure how this data will be stored. I want to both store this single value to the output database and access it as a variable from the code.

Local vector double

As above I have an interesting event in my model (e.g. packet was queued) but this time I have some additional double number for that event (e.g. packet size). All events in the data collection window result in the sample of pairs (time, double) for every model instance. I want to access (iterate, store to the output database, apply function) this sample.

Global vector double

As above, but I don't care of which model instance fires events. I want to access global sample (time, double) produced by several (m.b. all) model instances.

Vector to scalar reduction

I want to automatically apply some function to the vector double statistics. The following functions must be "built-in":

  • sample size
  • sample total time
  • last value, last timestamp
  • sum
  • sample average (\sum x / size)
  • time average (\sum (x * \delta t) / time)
  • sum / time
  • min, max, var

I should be able to write my own reduction function. The function output can be accessed as scalar double, like the counter from above. Since vectors can be huge I need an option of keeping only reduced value in memory / output database. Arbitrary number of reduced scalars can be obtained from the same vector.

Use case: wifi throughput

Classical experiment of measuring aggregate wifi throughput a-la Bianchi is an illustration of reduced (sum/time) global double (packet size received at wifi MAC) use case.

Use case: wifi channel utilization

I want to calculate per-device wifi channel utilization defined as the fraction of time PHY was not IDLE. To do this, I can add new trace source with single double argument and call it with 1.0 when PHY goes from IDLE state and with 0.0 when PHY becomes IDLE. Then I apply "time average" reduction (reduced local double). Alternatively I can do this in non-intrusive way, writing an adapter class, which listenes to PHY state change events of the form (old_state, new_state) and produces data source events as above.

Use case: average MANET node degree

I want to access time- and node-average node degree (number of neighbors) in some particular manet scenario using OLSR routing. To do this, I can add new double data source "Number of neighbors" to OLSR protocol and fire it when number of neighbors changes and reduce these per-node vectors to per-node time averages. At the end of simulation I will apply "avg" function to the resulting set of per-node values to get node- and time- average degree.

Vector resampling

I want to automatically convert my "raw" vector of (time, double) samples to the new vector of (time, double) samples in such way, that new time values are strictly periodical and _globally_ synchronous. Example: vector of samples {(0, x1), (1, x2), (3.5, x3), (4.1, x4)} is resampled to the vector {(0, y1), (2, y2), (4, y4)}. A set of original samples inside the same time "slot" of the resampled vector produce one value which is representative of the "slot". The resampled value is computed in the same way as the whole vector is reduced to scalar, see above. The same built-in reduction function are supported:

  • count (= number of events in the slot, = sub-sample size)
  • last value, last timestamp
  • sum
  • sub-sample average (\sum x / size)
  • time average (\sum (x * \delta t) / time)
  • sum / time
  • min, max, var

User specified functions are supported too. The size of the time slot for resampling is restricted to be the same for _all_ data vectors (global framework parameter) and all resampled vectors are synchronous in the corresponding data collection windows. Vectors, resampled in this way can be compared slot-by-slot.

Functions on vectors and pipelines

A want to apply a function to several (in general) resampled vectors to produce single new vector. This vector can be used in the same way as original ones: feed to the input of another function, reduced to scalar, written to the output database.

Use case: moving window average

I can apply moving window average (window size is a multiple of the time slot) to detect the end of transient process in my observable.

Use case: packet delivery ratio

I want to measure the PDR of single CBR stream. CBR stream is implemented as two applications: traffic source and traffic sink. Source application fires a "packet send" event, this produces a vector of counters of the form (time, 1.0). Sink application fires a "packet received" event, this produces an another vector of counters. Both vectors are resampled with "count" function. Resampled vector are inputs to the "a/b" function and the output vector is a time-dependent packet delivery ratio.

Use case: ITU R-factor

As above, but I want to automatically measure the ITU E-model R factor, which is a function of PDR and average packet delay. To do this, sink application from above also fires "packet received" events with a double delay parameter. This vector is resampled and feed to the R-factor calculation module together with PDR vector. Total pipeline looks this way:

        Packet send
Source: ---------------\
                       |  Packet delivery ratio
        Packet recvd   |--------------------------\
Sink  : ---------------/                          |  R-factor
        delay                                     |-----------------
        ------------------------------------------/

Local factor

I want to access per-instance model attributes using data collection framework on the same basis as counter above, e.g. store to the output database as factors (inputs) of my experimental plan. Attributes can be both integers/doubles, strings (e.g. "11Mbps" or "Minstrel") or booleans (true/false or on/off semantics) and the last value will be recorded if attribute changes with time.

Global factor

In my simulation all instances of the same model have the same value of some particular attribute (e.g. the same Slot for all WifiMac instances). I want to access this value as above using model TypeId. I want to know, what will happen if different instances of the model do have different values for this attribute.

March 2013 code review

This is just a placeholder on the wiki to store some documentation related to the code review (please discuss on ns-3-reviews or within the code review issue): https://codereview.appspot.com/7436051/

July 2013 code review

The code review issue has been updated: https://codereview.appspot.com/10974043

The data collection code is intended for src/stats directory. New ns-3 manual documentation is posted here:

Use cases

This section describes a few use cases that the framework is intended to be able to support (not all capabilities are

interface bandwidth statistics

A user previously asked on ns-3-users:


   Basically, what is suggested to take a percentage of how much
   bandwidth is taken in a point to point link? Is flowmonitor the
   right tool for this, because I have gone through the documentation
   of the module however I am not sure that I require all that
   complexity. basically, I would like to simple take bandwidth
   measurements at any given point in time during the simulation of
   certain designated links.

One could envision some kind of PointToPointHelper methods to print this information out, using BasicStatsCollector.

 /* Plot sending and receiving throughput, averaged at 1 second
    intervals, in a Gnuplot */
 PointToPointHelper::PlotInterfaceThroughput (Ptr<NetDevice> nd) 
 PointToPointHelper::PlotInterfaceThroughputAll (const NetDeviceContainer &ndc) 
 PointToPointHelper::WriteInterfaceThroughtput ...  /* File variant*/

athstats helper

There is an athstats helper class that prints out formatted text statistics of the Wifi NetDevice corresponding to what the athstats tool might print out:

 examples/wireless/wifi-ap.cc:  AthstatsHelper athstats;  examples/wireless/wifi-ap.cc:  athstats.EnableAthstats ("athstats-sta", stas);  examples/wireless/wifi-ap.cc:  athstats.EnableAthstats ("athstats-ap", ap);

This was written before the data collection framework existed. It consists of a helper class that mainly hooks a number of traces in the WifiNetDevice, collects statistics from them, and periodically writes the statistics out to file, and resets:

   void DevTxTrace (std::string context, Ptr<const Packet> p);
   void DevRxTrace (std::string context, Ptr<const Packet> p);
   void TxRtsFailedTrace (std::string context, Mac48Address address);
   void TxDataFailedTrace (std::string context, Mac48Address address);
   void TxFinalRtsFailedTrace (std::string context, Mac48Address address);
   void TxFinalDataFailedTrace (std::string context, Mac48Address address);
   void PhyRxOkTrace (std::string context, Ptr<const Packet> packet, double snr, WifiMode mode, enum WifiPreamble preamble);
   void PhyRxErrorTrace (std::string context, Ptr<const Packet> packet, double snr);
   void PhyTxTrace (std::string context, Ptr<const Packet> packet, WifiMode mode, WifiPreamble preamble, uint8_t txPower);
   void PhyStateTrace (std::string context, Time start, Time duration, enum WifiPhy::State state);

The print to file is just a formatted printf:

 snprintf (str, 200, "%8u %8u %7u %7u %7u %6u %6u %6u %7u %4u %3uM\n"

In the context of the data collection framework, this helper is analogous to a custom Collector object, that hooks directly to trace sources (without probes), and also contains FileAggregator support (that is, it is a combined Collector+Aggregator).

The DCF way to write this would be as follows. First, if there were probes available for these trace signatures, they could be added, but this is not strictly necessary. The Athstats helper could be written still largely as a custom collector, but the file handling aspects could be handled perhaps by a stock file aggregator object, to which the specially formatted printf string format could be provided.

object start/stop time tracker

Development road map

This section was last edited July 27, 2013

We are hoping to merge the initial portion of data collection framework for ns-3.18 (August release).


Background and related work