Data Collection Framework: Difference between revisions
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Scope == | == Scope == | ||
Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage | Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage. This project has been going on for several years so this wiki page collects information across a long timespan; some has been overcome by events. | ||
== Arch and API Proposal == | == Arch and API Proposal == | ||
Simulation data collection process is a three-stage pipeline | This is the original design/API proposal discussed between Pavel Boyko, Felipe Perrone, and Mathieu Lacage. | ||
Simulation data collection process is a three-stage pipeline: | |||
1. Models report ''events'' using existing trace source mechanism. | 1. Models report ''events'' using existing trace source mechanism. | ||
Line 149: | Line 151: | ||
The data collection code is intended for src/stats directory. New ns-3 manual documentation is posted here: | The data collection code is intended for src/stats directory. New ns-3 manual documentation is posted here: | ||
* Draft of [https://www.nsnam.org/wiki/index.php/File:Ns-3-data-collection-manual.pdf proposed manual documentation] | * Draft of [https://www.nsnam.org/wiki/index.php/File:Ns-3-data-collection-manual.pdf proposed manual documentation] | ||
== Use cases == | |||
This section describes a few use cases that the framework is intended to be able to support (not all capabilities are implemented). | |||
This supplements the discussion on informal requirements [[Data_Collection_Framework#Informal_requirements | above]]. | |||
=== interface bandwidth statistics === | |||
A user previously asked on ns-3-users: | |||
Basically, what is suggested to take a percentage of how much | |||
bandwidth is taken in a point to point link? Is flowmonitor the | |||
right tool for this, because I have gone through the documentation | |||
of the module however I am not sure that I require all that | |||
complexity. basically, I would like to simple take bandwidth | |||
measurements at any given point in time during the simulation of | |||
certain designated links. | |||
One could envision some kind of PointToPointHelper methods to print this information out, using BasicStatsCollector. | |||
/* Plot sending and receiving throughput, averaged at 1 second | |||
intervals, in a Gnuplot */ | |||
PointToPointHelper::PlotInterfaceThroughput (Ptr<NetDevice> nd) | |||
PointToPointHelper::PlotInterfaceThroughputAll (const NetDeviceContainer &ndc) | |||
PointToPointHelper::WriteInterfaceThroughtput ... /* File variant*/ | |||
=== athstats helper === | |||
There is an athstats helper class that prints out formatted text statistics of the Wifi NetDevice corresponding to what the athstats tool might print out: | |||
examples/wireless/wifi-ap.cc: | |||
AthstatsHelper athstats; | |||
examples/wireless/wifi-ap.cc: athstats.EnableAthstats ("athstats-sta", stas); | |||
examples/wireless/wifi-ap.cc: athstats.EnableAthstats ("athstats-ap", ap); | |||
This was written before the data collection framework existed. It consists of a helper class that mainly hooks a number of traces in the WifiNetDevice, collects statistics from them, and periodically writes the statistics out to file, and resets: | |||
void DevTxTrace (std::string context, Ptr<const Packet> p); | |||
void DevRxTrace (std::string context, Ptr<const Packet> p); | |||
void TxRtsFailedTrace (std::string context, Mac48Address address); | |||
void TxDataFailedTrace (std::string context, Mac48Address address); | |||
void TxFinalRtsFailedTrace (std::string context, Mac48Address address); | |||
void TxFinalDataFailedTrace (std::string context, Mac48Address address); | |||
void PhyRxOkTrace (std::string context, Ptr<const Packet> packet, double snr, WifiMode mode, enum WifiPreamble preamble); | |||
void PhyRxErrorTrace (std::string context, Ptr<const Packet> packet, double snr); | |||
void PhyTxTrace (std::string context, Ptr<const Packet> packet, WifiMode mode, WifiPreamble preamble, uint8_t txPower); | |||
void PhyStateTrace (std::string context, Time start, Time duration, enum WifiPhy::State state); | |||
The print to file is just a formatted printf: | |||
snprintf (str, 200, "%8u %8u %7u %7u %7u %6u %6u %6u %7u %4u %3uM\n" | |||
In the context of the data collection framework, this helper is analogous to a custom Collector object, that hooks directly to trace sources (without probes), and also contains FileAggregator support (that is, it is a combined Collector+Aggregator). | |||
The DCF way to write this would be as follows. First, if there were probes available for these trace signatures, they could be added, but this is not strictly necessary. The Athstats helper could be written still largely as a custom collector, but the file handling aspects could be handled perhaps by a stock file aggregator object, to which the specially formatted printf string format could be provided. | |||
=== object start/stop time tracker === | |||
Vedran asked whether the Object Start/Object Stop time tracker could be implemented with DCF. Basically, this is a variation on the requirement stated above for "Wifi Throughput"; what is desired is something like a "Duty Cycle collector" that will keep track of the proportion of time that an object was on or off, and report statistics at the end of the simulation. | |||
In discussing this use case, we observed that there is a need to actually stop the statistics framework at the end of the simulation, so that the time from the last event was recorded to the end of the simulation. That is, if the last state transition was at time t=1000 seconds, but the simulation ended at time t=2000 seconds, we need to ensure that this kind of report covers the time range of 1000-2000 seconds, so we need to stimulate somehow the report at time t=2000 seconds even if there is not an underlying state change to drive it. (Actually, '''this should be discussed further''': it could cause problems with reliability analysis -- it actually did in my experiments -- as it can happen that a certain object whose state changes depend on another object remains in started or stopped state until the end of simulation because that another object ended its state changing time. Sorry if this explanation isn't very clear; I'm willing to elaborate further if necessary. -- [[User:Vedranm|Vedran Miletić]]) | |||
== Development road map == | |||
The initial set of code was merged in the ns-3.18 release (easily enable the dumping of raw ns-3 trace source data into files and gnuplots): | |||
* basic probes | |||
* file aggregators | |||
* gnuplot aggregators | |||
* test code coverage for helpers, file and gnuplot aggregators, and additional probes | |||
* add coverage in ns-3 tutorial | |||
The ns-3.22 release added a TimeProbe class for traced values that emit values of type ns3::Time. | |||
=== Prioritized list for ns-3.23 and future releases === | |||
Li Li, Felipe Perrone, and Tom Henderson are working on these enhancements planned for ns-3.23 if possible: | |||
1) rename TimeSeriesAdaptor to EventDrivenCollector | |||
2) Add two new collector types: | |||
* TimeSeriesCollector (reports the most recently reported value of a traced value, at regular intervals) | |||
* TimeAverageCollector (produce time-averaged data across regular intervals, such as data rates (e.g. packets per second)) | |||
By default, these will not output values when the probe hasn't yet fired during the enabled period, although it may be a mode to enable in the future. | |||
3) add ability to scale y-axis data of GnuplotHelper by a scale factor | |||
4) add ability to GnuplotHelper to add data series to single plot | |||
---------------- | |||
(likely cutoff for ns-3.23) | |||
---------------- | |||
5) review and try to incorporate DCF code from Magister Solutions | |||
6) extend base class of DataCollectionObject to allow for setting the Enable/Disable times in a single statement | |||
7) allow users to learn all of the available traced values to be able to hook to, from the command line or from an output file | |||
8) Several collectors are nearly done in the http://code.nsnam.org/safe/ns-3.21-collector/ repository, such as BiVariateCollector. Prepare code reviews as necessary. | |||
9) Add capability to set time resolution on x-axis (defaults to Seconds) | |||
=== Longer-term items === | |||
* perform offline steady state/transient detection | |||
* perform online steady state/transient detection | |||
* can we enable/disable multiple DCF elements in a path, with a single statement, by virtue of their relationships? | |||
* Support more probes of all ns-3 trace sources | |||
* BasicStatsCollector, helpers, and examples: compute statistics to reduce probe'd data before writing to files, plots, and databases | |||
** provide helper objects, similar to Pcap and Ascii trace helpers, that easily enable the generation of output files or gnuplots for device-level interface bandwidth usage statistics, for Lte, WiFi, WiMax, Csma, PointToPoint devices | |||
* StateTrackerCollector: support the Wifi state machine, and Object Start/Stop time tracker use cases | |||
* Migrate Joe Kopena's data collection code and example to the new framework | |||
** this will enable some SQLite database support | |||
* Add ability to periodically write out interim results for long simulations | |||
=== Some core ns-3 issues === | |||
* This [https://www.nsnam.org/bugzilla/show_bug.cgi?id=127 | bug] on lack of type information in trace sources needs a solution, to clean up the helpers | |||
* DCF heavily uses the configuration store and config path database, these configuration paths should be [https://www.nsnam.org/bugzilla/show_bug.cgi?id=1213 audited (bug 1213)] | |||
== Background and related work == | == Background and related work == |
Latest revision as of 17:59, 24 April 2015
Scope
Data Collection Framework is responsible for simulation data monitoring, on-line reduction and storage. This project has been going on for several years so this wiki page collects information across a long timespan; some has been overcome by events.
Arch and API Proposal
This is the original design/API proposal discussed between Pavel Boyko, Felipe Perrone, and Mathieu Lacage.
Simulation data collection process is a three-stage pipeline:
1. Models report events using existing trace source mechanism.
2. A number of probes are set up to listen interesting events. Probes are responsible for on-line data normalization and reduction. Probes report reduced data using trace source mechanism, this allows user to organize probes into trees.
3. One or more collectors are set up to listen interesting probes. Collectors are responsible to store data in some meaningful format.
The following design restrictions are suggested to follow:
1. Probes should not generate data by themselves. This restriction allow probes to be reusable.
2. Collectors should not change data they receive from probes. This restriction allow collectors to be reusable.
Proof of concept implementation of these architecture is located here: http://codereview.appspot.com/3105042 and explained in some detail below.
Events & Probes
Events can be of any type (= trace source signature), actually every trace source can be considered as an event for data collection framework.
To be visible to the data collection framework event must be listened by some probe object. This is a probe responsibility to match event signature and produce some meaningful output. The simplest probe can listen for (void) events and count them.
Probes report their output data streams as trace sources, e.g. probe reporting {time, double} time series can have an "Output" trace source with (Time, double) signature. This allow user to create probe trees. Note that while number of probe inputs is not restricted here, it is recommended to have single probe output for all probes.
The following probe hierarchy is implemented in the proof-of-concept example:
- Probe - DoubleProbe - CountProbe - DivProbe - StringProbe - AttributeProbe
It is assumed that there will be created large number of "basic" probes, implementing all simple data reduction/normalization operations. Users can add their own probes as well.
Collectors
Collectors are responsible to listen probes and store data to some user-meaningful format. In current implementation collector also owns all registered probes (i.e. controls their lifetime). Only supported probe types can be registered on the collector, because collector must know how to store probe output data. This is the only place in the data collection pipeline, where data type is restricted.
CsvCollector in the proof-of-concept example supports string and double time series (StringProbe and DoubleProbe and their subclasses) and stores every registered probe as single CSV file. Many more useful collectors can be created.
There can be more than one collector in the running application. I can imagine one collector saving output data series, another one deciding for transient detection and run termination and the last one saving mobility/topology events in the visualization-compatible format.
Informal requirements
This is a place to collect functional requirements, examples of intended use and expected behavior for the data collection framework.
Null
I don't want to collect any data from my simulation. In this case I expect that I don't need to use (and even known about) the data collection framework. Also I expect no notable performance degradation of my legacy ns-3 applications.
Global data collection window
I want to setup data collection start and stop times. All events before start and after stop will be ignored by this framework.
Local data collection window
I want to be able to individually change data collection window for every enabled data source.
Local scalar counter
I have some interesting event in my model and I have a number of model instances. I want data collection framework to count the number of events and at any time in simulation access per-instance values of the counter for some (m.b. all) model instances. I expect that to do this I will need a single line of code for every model instance being monitored or one line to enable data collection from all already created instances. I expect that data collection configuration will give me some "handles" to address counter values for individual model instances. I'd like to have simple way to apply avg, sum, min, max, var, ... functions to the vector of these "handles", as well as save all values to the output database.
Global scalar counter
As above, but I just want to access the sum of counter value over all model instances at the end of simulation. I expect that to do this I need to add single line of code to configure data collection and one more line to configure how this data will be stored. I want to both store this single value to the output database and access it as a variable from the code.
Local vector double
As above I have an interesting event in my model (e.g. packet was queued) but this time I have some additional double number for that event (e.g. packet size). All events in the data collection window result in the sample of pairs (time, double) for every model instance. I want to access (iterate, store to the output database, apply function) this sample.
Global vector double
As above, but I don't care of which model instance fires events. I want to access global sample (time, double) produced by several (m.b. all) model instances.
Vector to scalar reduction
I want to automatically apply some function to the vector double statistics. The following functions must be "built-in":
- sample size
- sample total time
- last value, last timestamp
- sum
- sample average (\sum x / size)
- time average (\sum (x * \delta t) / time)
- sum / time
- min, max, var
I should be able to write my own reduction function. The function output can be accessed as scalar double, like the counter from above. Since vectors can be huge I need an option of keeping only reduced value in memory / output database. Arbitrary number of reduced scalars can be obtained from the same vector.
Use case: wifi throughput
Classical experiment of measuring aggregate wifi throughput a-la Bianchi is an illustration of reduced (sum/time) global double (packet size received at wifi MAC) use case.
Use case: wifi channel utilization
I want to calculate per-device wifi channel utilization defined as the fraction of time PHY was not IDLE. To do this, I can add new trace source with single double argument and call it with 1.0 when PHY goes from IDLE state and with 0.0 when PHY becomes IDLE. Then I apply "time average" reduction (reduced local double). Alternatively I can do this in non-intrusive way, writing an adapter class, which listenes to PHY state change events of the form (old_state, new_state) and produces data source events as above.
Use case: average MANET node degree
I want to access time- and node-average node degree (number of neighbors) in some particular manet scenario using OLSR routing. To do this, I can add new double data source "Number of neighbors" to OLSR protocol and fire it when number of neighbors changes and reduce these per-node vectors to per-node time averages. At the end of simulation I will apply "avg" function to the resulting set of per-node values to get node- and time- average degree.
Vector resampling
I want to automatically convert my "raw" vector of (time, double) samples to the new vector of (time, double) samples in such way, that new time values are strictly periodical and _globally_ synchronous. Example: vector of samples {(0, x1), (1, x2), (3.5, x3), (4.1, x4)} is resampled to the vector {(0, y1), (2, y2), (4, y4)}. A set of original samples inside the same time "slot" of the resampled vector produce one value which is representative of the "slot". The resampled value is computed in the same way as the whole vector is reduced to scalar, see above. The same built-in reduction function are supported:
- count (= number of events in the slot, = sub-sample size)
- last value, last timestamp
- sum
- sub-sample average (\sum x / size)
- time average (\sum (x * \delta t) / time)
- sum / time
- min, max, var
User specified functions are supported too. The size of the time slot for resampling is restricted to be the same for _all_ data vectors (global framework parameter) and all resampled vectors are synchronous in the corresponding data collection windows. Vectors, resampled in this way can be compared slot-by-slot.
Functions on vectors and pipelines
A want to apply a function to several (in general) resampled vectors to produce single new vector. This vector can be used in the same way as original ones: feed to the input of another function, reduced to scalar, written to the output database.
Use case: moving window average
I can apply moving window average (window size is a multiple of the time slot) to detect the end of transient process in my observable.
Use case: packet delivery ratio
I want to measure the PDR of single CBR stream. CBR stream is implemented as two applications: traffic source and traffic sink. Source application fires a "packet send" event, this produces a vector of counters of the form (time, 1.0). Sink application fires a "packet received" event, this produces an another vector of counters. Both vectors are resampled with "count" function. Resampled vector are inputs to the "a/b" function and the output vector is a time-dependent packet delivery ratio.
Use case: ITU R-factor
As above, but I want to automatically measure the ITU E-model R factor, which is a function of PDR and average packet delay. To do this, sink application from above also fires "packet received" events with a double delay parameter. This vector is resampled and feed to the R-factor calculation module together with PDR vector. Total pipeline looks this way:
Packet send Source: ---------------\ | Packet delivery ratio Packet recvd |--------------------------\ Sink : ---------------/ | R-factor delay |----------------- ------------------------------------------/
Local factor
I want to access per-instance model attributes using data collection framework on the same basis as counter above, e.g. store to the output database as factors (inputs) of my experimental plan. Attributes can be both integers/doubles, strings (e.g. "11Mbps" or "Minstrel") or booleans (true/false or on/off semantics) and the last value will be recorded if attribute changes with time.
Global factor
In my simulation all instances of the same model have the same value of some particular attribute (e.g. the same Slot for all WifiMac instances). I want to access this value as above using model TypeId. I want to know, what will happen if different instances of the model do have different values for this attribute.
March 2013 code review
This is just a placeholder on the wiki to store some documentation related to the code review (please discuss on ns-3-reviews or within the code review issue): https://codereview.appspot.com/7436051/
- Slides summarizing this issue
- Draft of proposed manual documentation
July 2013 code review
The code review issue has been updated: https://codereview.appspot.com/10974043
The data collection code is intended for src/stats directory. New ns-3 manual documentation is posted here:
- Draft of proposed manual documentation
Use cases
This section describes a few use cases that the framework is intended to be able to support (not all capabilities are implemented).
This supplements the discussion on informal requirements above.
interface bandwidth statistics
A user previously asked on ns-3-users:
Basically, what is suggested to take a percentage of how much bandwidth is taken in a point to point link? Is flowmonitor the right tool for this, because I have gone through the documentation of the module however I am not sure that I require all that complexity. basically, I would like to simple take bandwidth measurements at any given point in time during the simulation of certain designated links.
One could envision some kind of PointToPointHelper methods to print this information out, using BasicStatsCollector.
/* Plot sending and receiving throughput, averaged at 1 second intervals, in a Gnuplot */ PointToPointHelper::PlotInterfaceThroughput (Ptr<NetDevice> nd) PointToPointHelper::PlotInterfaceThroughputAll (const NetDeviceContainer &ndc) PointToPointHelper::WriteInterfaceThroughtput ... /* File variant*/
athstats helper
There is an athstats helper class that prints out formatted text statistics of the Wifi NetDevice corresponding to what the athstats tool might print out:
examples/wireless/wifi-ap.cc: AthstatsHelper athstats; examples/wireless/wifi-ap.cc: athstats.EnableAthstats ("athstats-sta", stas); examples/wireless/wifi-ap.cc: athstats.EnableAthstats ("athstats-ap", ap);
This was written before the data collection framework existed. It consists of a helper class that mainly hooks a number of traces in the WifiNetDevice, collects statistics from them, and periodically writes the statistics out to file, and resets:
void DevTxTrace (std::string context, Ptr<const Packet> p); void DevRxTrace (std::string context, Ptr<const Packet> p); void TxRtsFailedTrace (std::string context, Mac48Address address); void TxDataFailedTrace (std::string context, Mac48Address address); void TxFinalRtsFailedTrace (std::string context, Mac48Address address); void TxFinalDataFailedTrace (std::string context, Mac48Address address); void PhyRxOkTrace (std::string context, Ptr<const Packet> packet, double snr, WifiMode mode, enum WifiPreamble preamble); void PhyRxErrorTrace (std::string context, Ptr<const Packet> packet, double snr); void PhyTxTrace (std::string context, Ptr<const Packet> packet, WifiMode mode, WifiPreamble preamble, uint8_t txPower); void PhyStateTrace (std::string context, Time start, Time duration, enum WifiPhy::State state);
The print to file is just a formatted printf:
snprintf (str, 200, "%8u %8u %7u %7u %7u %6u %6u %6u %7u %4u %3uM\n"
In the context of the data collection framework, this helper is analogous to a custom Collector object, that hooks directly to trace sources (without probes), and also contains FileAggregator support (that is, it is a combined Collector+Aggregator).
The DCF way to write this would be as follows. First, if there were probes available for these trace signatures, they could be added, but this is not strictly necessary. The Athstats helper could be written still largely as a custom collector, but the file handling aspects could be handled perhaps by a stock file aggregator object, to which the specially formatted printf string format could be provided.
object start/stop time tracker
Vedran asked whether the Object Start/Object Stop time tracker could be implemented with DCF. Basically, this is a variation on the requirement stated above for "Wifi Throughput"; what is desired is something like a "Duty Cycle collector" that will keep track of the proportion of time that an object was on or off, and report statistics at the end of the simulation.
In discussing this use case, we observed that there is a need to actually stop the statistics framework at the end of the simulation, so that the time from the last event was recorded to the end of the simulation. That is, if the last state transition was at time t=1000 seconds, but the simulation ended at time t=2000 seconds, we need to ensure that this kind of report covers the time range of 1000-2000 seconds, so we need to stimulate somehow the report at time t=2000 seconds even if there is not an underlying state change to drive it. (Actually, this should be discussed further: it could cause problems with reliability analysis -- it actually did in my experiments -- as it can happen that a certain object whose state changes depend on another object remains in started or stopped state until the end of simulation because that another object ended its state changing time. Sorry if this explanation isn't very clear; I'm willing to elaborate further if necessary. -- Vedran Miletić)
Development road map
The initial set of code was merged in the ns-3.18 release (easily enable the dumping of raw ns-3 trace source data into files and gnuplots):
- basic probes
- file aggregators
- gnuplot aggregators
- test code coverage for helpers, file and gnuplot aggregators, and additional probes
- add coverage in ns-3 tutorial
The ns-3.22 release added a TimeProbe class for traced values that emit values of type ns3::Time.
Prioritized list for ns-3.23 and future releases
Li Li, Felipe Perrone, and Tom Henderson are working on these enhancements planned for ns-3.23 if possible:
1) rename TimeSeriesAdaptor to EventDrivenCollector
2) Add two new collector types:
- TimeSeriesCollector (reports the most recently reported value of a traced value, at regular intervals)
- TimeAverageCollector (produce time-averaged data across regular intervals, such as data rates (e.g. packets per second))
By default, these will not output values when the probe hasn't yet fired during the enabled period, although it may be a mode to enable in the future.
3) add ability to scale y-axis data of GnuplotHelper by a scale factor
4) add ability to GnuplotHelper to add data series to single plot
(likely cutoff for ns-3.23)
5) review and try to incorporate DCF code from Magister Solutions
6) extend base class of DataCollectionObject to allow for setting the Enable/Disable times in a single statement
7) allow users to learn all of the available traced values to be able to hook to, from the command line or from an output file
8) Several collectors are nearly done in the http://code.nsnam.org/safe/ns-3.21-collector/ repository, such as BiVariateCollector. Prepare code reviews as necessary.
9) Add capability to set time resolution on x-axis (defaults to Seconds)
Longer-term items
- perform offline steady state/transient detection
- perform online steady state/transient detection
- can we enable/disable multiple DCF elements in a path, with a single statement, by virtue of their relationships?
- Support more probes of all ns-3 trace sources
- BasicStatsCollector, helpers, and examples: compute statistics to reduce probe'd data before writing to files, plots, and databases
- provide helper objects, similar to Pcap and Ascii trace helpers, that easily enable the generation of output files or gnuplots for device-level interface bandwidth usage statistics, for Lte, WiFi, WiMax, Csma, PointToPoint devices
- StateTrackerCollector: support the Wifi state machine, and Object Start/Stop time tracker use cases
- Migrate Joe Kopena's data collection code and example to the new framework
- this will enable some SQLite database support
- Add ability to periodically write out interim results for long simulations
Some core ns-3 issues
- This | bug on lack of type information in trace sources needs a solution, to clean up the helpers
- DCF heavily uses the configuration store and config path database, these configuration paths should be audited (bug 1213)
- Joe Kopena's framework (stats module): Statistical Framework for Network Simulation
- Akaroa: http://www.cosc.canterbury.ac.nz/research/RG/net_sim/simulation_group/akaroa/about.chtml
- Poster at simutools: http://www.eg.bucknell.edu/~perrone/Research_files/poster.pdf
- Research paper on automation: http://www.eg.bucknell.edu/~perrone/Research_files/paper.pdf
- Another related paper: M. Andreozzi, G. Stea, C. Vallati, "A Framework for Large-scale Simulations and Output Result Analysis with ns-2", Proceedings of QoSim 2009, Rome, Italy, March 6, 2009 [11/24].