Difference between revisions of "Data Collection Framework"

From Nsnam
Jump to: navigation, search
(Scope)
Line 1: Line 1:
 
== Scope ==
 
== Scope ==
 
Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage
 
Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage
 +
 +
== API Proposal ==
  
 
== Informal requirements ==
 
== Informal requirements ==

Revision as of 16:17, 16 November 2010

Scope

Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage

API Proposal

Informal requirements

This is a place to collect functional requirements, examples of intended use and expected behavior for the data collection framework.

Null

I don't want to collect any data from my simulation. In this case I expect that I don't need to use (and even known about) the data collection framework. Also I expect no notable performance degradation of my legacy ns-3 applications.

Global data collection window

I want to setup data collection start and stop times. All events before start and after stop will be ignored by this framework.

Local data collection window

I want to be able to individually change data collection window for every enabled data source.

Local scalar counter

I have some interesting event in my model and I have a number of model instances. I want data collection framework to count the number of events and at any time in simulation access per-instance values of the counter for some (m.b. all) model instances. I expect that to do this I will need a single line of code for every model instance being monitored or one line to enable data collection from all already created instances. I expect that data collection configuration will give me some "handles" to address counter values for individual model instances. I'd like to have simple way to apply avg, sum, min, max, var, ... functions to the vector of these "handles", as well as save all values to the output database.

Global scalar counter

As above, but I just want to access the sum of counter value over all model instances at the end of simulation. I expect that to do this I need to add single line of code to configure data collection and one more line to configure how this data will be stored. I want to both store this single value to the output database and access it as a variable from the code.

Local vector double

As above I have an interesting event in my model (e.g. packet was queued) but this time I have some additional double number for that event (e.g. packet size). All events in the data collection window result in the sample of pairs (time, double) for every model instance. I want to access (iterate, store to the output database, apply function) this sample.

Global vector double

As above, but I don't care of which model instance fires events. I want to access global sample (time, double) produced by several (m.b. all) model instances.

Vector to scalar reduction

I want to automatically apply some function to the vector double statistics. The following functions must be "built-in":

  • sample size
  • sample total time
  • last value, last timestamp
  • sum
  • sample average (\sum x / size)
  • time average (\sum (x * \delta t) / time)
  • sum / time
  • min, max, var

I should be able to write my own reduction function. The function output can be accessed as scalar double, like the counter from above. Since vectors can be huge I need an option of keeping only reduced value in memory / output database. Arbitrary number of reduced scalars can be obtained from the same vector.

Use case: wifi throughput

Classical experiment of measuring aggregate wifi throughput a-la Bianchi is an illustration of reduced (sum/time) global double (packet size received at wifi MAC) use case.

Use case: wifi channel utilization

I want to calculate per-device wifi channel utilization defined as the fraction of time PHY was not IDLE. To do this, I can add new trace source with single double argument and call it with 1.0 when PHY goes from IDLE state and with 0.0 when PHY becomes IDLE. Then I apply "time average" reduction (reduced local double). Alternatively I can do this in non-intrusive way, writing an adapter class, which listenes to PHY state change events of the form (old_state, new_state) and produces data source events as above.

Use case: average MANET node degree

I want to access time- and node-average node degree (number of neighbors) in some particular manet scenario using OLSR routing. To do this, I can add new double data source "Number of neighbors" to OLSR protocol and fire it when number of neighbors changes and reduce these per-node vectors to per-node time averages. At the end of simulation I will apply "avg" function to the resulting set of per-node values to get node- and time- average degree.

Vector resampling

I want to automatically convert my "raw" vector of (time, double) samples to the new vector of (time, double) samples in such way, that new time values are strictly periodical and _globally_ synchronous. Example: vector of samples {(0, x1), (1, x2), (3.5, x3), (4.1, x4)} is resampled to the vector {(0, y1), (2, y2), (4, y4)}. A set of original samples inside the same time "slot" of the resampled vector produce one value which is representative of the "slot". The resampled value is computed in the same way as the whole vector is reduced to scalar, see above. The same built-in reduction function are supported:

  • count (= number of events in the slot, = sub-sample size)
  • last value, last timestamp
  • sum
  • sub-sample average (\sum x / size)
  • time average (\sum (x * \delta t) / time)
  • sum / time
  • min, max, var

User specified functions are supported too. The size of the time slot for resampling is restricted to be the same for _all_ data vectors (global framework parameter) and all resampled vectors are synchronous in the corresponding data collection windows. Vectors, resampled in this way can be compared slot-by-slot.

Functions on vectors and pipelines

A want to apply a function to several (in general) resampled vectors to produce single new vector. This vector can be used in the same way as original ones: feed to the input of another function, reduced to scalar, written to the output database.

Use case: moving window average

I can apply moving window average (window size is a multiple of the time slot) to detect the end of transient process in my observable.

Use case: packet delivery ratio

I want to measure the PDR of single CBR stream. CBR stream is implemented as two applications: traffic source and traffic sink. Source application fires a "packet send" event, this produces a vector of counters of the form (time, 1.0). Sink application fires a "packet received" event, this produces an another vector of counters. Both vectors are resampled with "count" function. Resampled vector are inputs to the "a/b" function and the output vector is a time-dependent packet delivery ratio.

Use case: ITU R-factor

As above, but I want to automatically measure the ITU E-model R factor, which is a function of PDR and average packet delay. To do this, sink application from above also fires "packet received" events with a double delay parameter. This vector is resampled and feed to the R-factor calculation module together with PDR vector. Total pipeline looks this way:

        Packet send
Source: ---------------\
                       |  Packet delivery ratio
        Packet recvd   |--------------------------\
Sink  : ---------------/                          |  R-factor
        delay                                     |-----------------
        ------------------------------------------/

Local factor

I want to access per-instance model attributes using data collection framework on the same basis as counter above, e.g. store to the output database as factors (inputs) of my experimental plan. Attributes can be both integers/doubles, strings (e.g. "11Mbps" or "Minstrel") or booleans (true/false or on/off semantics) and the last value will be recorded if attribute changes with time.

Global factor

In my simulation all instances of the same model have the same value of some particular attribute (e.g. the same Slot for all WifiMac instances). I want to access this value as above using model TypeId. I want to know, what will happen if different instances of the model do have different values for this attribute.

Background and related work