Validation means the ability to determine that a given model has some connection to some reality -- that a model is an accurate representation of a real system. If you are going to use a simulation model to try and predict how some real system is going to behave, you must have some reason to believe your results -- i.e., can you trust that an inference made from the model translates into a correct prediction for the real system. Validation of a model provides such a basis for trust.
Every model has a target system that it is attempting to simulate. The first step in writing a simulation is to identify this target system and the level of detail and accuracy that the simulation is desired to reproduce. Once this is done, one can develop an abstract model of the target system. This is typically an exercise in managing the tradeoffs between complexity, resource requiremens and accuracy. This step has been called model qualification in the literature. This abstract model is then developed into an ns-3 model that implements the abstract model as a computer program. The process of getting the implementation to agree with the abstract model is called model verification in the literature. This is what people usually think of when they imagine software testing. The process of getting the ns-3 model behavior to agree with the target system behavior is called model validation in the literature. Since validation may ultimately be based on some non-deterministic physical process, it will probably be a stochastic process.
The process used to validate a model is conceptually quite simple. One compares the behavior of the ns-3 model to the behavior of the target system, and makes adjustments to the abstract model and/or the ns-3 model to improve the correlation. This part of the process is sometimes called calibrating the abstract model.
At the end of the validation process, it is desirable to have a number of repeatable results that can demonstrate to other users that a given ns-3 model is faithful to a given abstract model and that this abstract model is, in turn, a faithful representation of the target system based on the initial qualification process. We call this collection of results the Validation Suite. You can think of this as the collection of experiments that have been used to validate a model, which can be run collectively as a suite of tests on the model for use by other users.
The Basic Process
The process of modeling some target system is then composed of three main parts: model qualification, model verification and model validation. As you might expect, we are only going to address model validation here. In the literature, Naylor and Finger call this piece of the puzzle, Validation of Input-Output Transformations. Since the target system is often ultimately based on physical processes which are governed statistically, statistical comparisons between experiments done on the target system and simulations of experiments done on the ns-3 model will be in order in this case. Since the target system may have also conform to a logical specification, deterministic experiments on the ns-3 model will be in order for that case.
The goal is to compare the behavior of the target system to the ns-3 model in some set of ways. We must then identify some behavior, or observable, to be validated and then design an experiment to determine whether or not the ns-3 model behaves in a way that is consistent with the target system in that respect. We want to propose tests that the ns-3 model would fail if it were not operating consistenly with the target system. What does that really mean?
In the deterministic case, we are talking about software testing in the conventional sense. For example, if a model is placed in state X and is requested to perform operation Y, it should perform a set of actions A, B, C. We believe this is really a case of model verification and we expect these kinds of tests to be done in the Testing framework.
In the statistical case, we are talking about a random variable -- a quantity that has no definite value, but rather has an ensemble of values that it can assume. This ensemble of values varies according to some probability distribution. For some set of conditions then, the measurements of the random variable taken on the target system will have some distribution with some number of moments such as the expectation value (mean) and a variance. If we run the same experiment on the ns-3 model under identical simulated conditions, measurements of the random variable will also have some distribution. In order to validate the ns-3 model, we need to demonstrate that measurements of the ns-3 model observable are drawn from the same distribution as measurements of the target system observable to some level of statistical significance. In other words, we are looking to support the null hypothesis (H0). The chi-squared test for goodness-of-fit is commonly used in such situations.
What Does This Say About Requirements?
It seems that the description of the problem above leads us to conclude that the Validation toolkit in ns-3 is really a framework for statistical analysis of experimental data. There are three basic pieces to the puzzle:
- For an experiment/test, how does one collect and organize data collected from the target system and determine and specify the real distributions of the random variables;
- For an experiment/test, how does one collect and organize data collected from the ns-3 model and determine and specify the simulated distributions of the random variables;
- How does one actually perform the tests that determine whether the ns-3 model passes or fails (is consistent or not consistent with the target system).
We clearly don't want to get into the business of organizing experiments done on the target system, but we do need to figure out how to get information about the results of real experiments into the ns-3 validation framework as some form of reduced data description.
We do need to be able to run simulations in ns-3 in order to collect data generated by the ns-3 models under validation. This implies some kind of statistics gathering framework, perhaps like Joe Kopena's framework.
We need to be able to perform statistical analysis on the gathered data in order to reduce the data and we need to be able to perform various tests of statistical inference such as chi-square and least-squares fitting to do the null hypothesis testing.
It seems that, at a basic level, we are talking about:
- A data-gathering toolkit remniscent of the stats framework that allows us to run sets of experiments that generate data from the ns-3 models under test;
- A data-reduction toolkit that allows us to take the generated data and reduce it to some distribution with associated moments;
- A statistical analysis toolkit that allows us to make comparisons between an expected distribution and a measured distribution;
- A testing framework that allows us to drive all of this automatically so we can use it in a "super" regression test.
Craigdo 19:24, 1 April 2009 (UTC)