VerificationValidationAndTesting

From Nsnam
Revision as of 04:36, 3 April 2009 by Craigdo (Talk | contribs)

Jump to: navigation, search

Verification, Validation and Testing

There is often much confusion regarding the meaning of the words Verification, Validation and Testing; and other associated terminology. It will be worthwhile to spend a little time establishing exactly what we mean when we use them.

A computer model is a mathematical or logical representation of something. It can represent a vehicle, a frog or a networking card. Models can also represent processes such as global warming, freeway traffic flow or a specification of a networking protocol. Models can be completely faithful representations of a logical process specification, but they necessarily can never completely simulate a physical object or process. In most cases, a number of simplifications are made to the model to make simulation computationally tractable.

Every model has a target system that it is attempting to simulate. The first step in creating a simulation model is to identify this target system and the level of detail and accuracy that the simulation is desired to reproduce. In the case of a logical process, the target system may be identified as TCP as defined by RFC 793. In this case, it will probably be desirable to create a model that completely and faithfully reproduces RFC 793. In the case of a physical process this will not be possible. If, for example, you would like to simulate a wireless networking card, you may determine that you need, "an accurate MAC-level implementation of the 802.11 specification and [...] a not-so-slow PHY-level model of the 802.11a specification."

Once this is done, one can develop an abstract model of the target system. This is typically an exercise in managing the tradeoffs between complexity, resource requiremens and accuracy. The process of developing an abstract model has been called model qualification in the literature. In the case of a TCP protocol, this process results in a design for a collection of objects, interactions and behaviors that will fully implement RFC 793 in ns-3. In the case of the wireless card, this process results in a number of tradeoffs to allow the physical layer to be simulated and the design of a network device and channel for ns-3, along with the desired objects, interactions and behaviors.

This abstract model is then developed into an ns-3 model that implements the abstract model as a computer program. The process of getting the implementation to agree with the abstract model is called model verification in the literature.

The process so far is open loop. What remains is to maek a determination that a given ns-3 model has some connection to some reality -- that a model is an accurate representation of a real system, whether a logical process or a physical entity. If you are going to use a simulation model to try and predict how some real system is going to behave, you must have some reason to believe your results -- i.e., can you trust that an inference made from the model translates into a correct prediction for the real system. The process of getting the ns-3 model behavior to agree with the desired target system behavior as defined by the model qualification process is called model validation in the literature. In the case of a TCP implementation, you may want to compare the behavior of your ns-3 TCP model to some reference implementation in order to validate your model. In the case of a wireless physical layer simulation, you may want to compare the behavior of your model to that of real hardware in a controlled setting,

Generally, the process is usually described as a closed loop with variations on the following theme:

 target-system <---------------> abstract-model <--------------> ns-3 model
       ^         qualification                    verification      ^
       |                                                            |
       +------------------------------------------------------------+
                               validation

The following are the definitions we will use:

  • Domain of applicability: Prescribed conditions for which the model has been tested, compared against reality to the extent possible, and judged suitable for use;
  • Qualification: The process of defining the accuracy of a model in order to make a simulation tractable;
  • Range of accuracy: Demonstrated agreement between the computerized model and reality within a domain of applicability.
  • Simulation: Modeling of systems and their operations using various means of representation;
  • Reality: An entity, situation, or system selected for analysis -- a target-system;
  • Validation: Substantiation that a model, within its domain of applicability, possesses a satisfactory range of accuracy consistent with the intended application of the model;
  • Verification: Substantiation that the implementation of an abstract model is correct and performs as intended.

Note that we have not used the term software testing at all in this discussion. The process of qualification, verification and validation is really a research and development activity. Many of the checks implemented in the verification phase are ultimately reused in a software test quite however, leading to a blurring of the tasks. Conceptually, however, neither qualification, verification nor validation has anything to do with software testing in its commonly understood sense. The goal of model verification and validation is, as is suggested by the defintions above, substantiation that a model does what is advertised.

You will find some of the same terms and concepts used in discussions of software testing, however. Software Testing is an investigation conducted to provide information about the quality of the product. This is more of a manufacturing process activity -- given a model that has been verified and validated, software testing ensures that the model can be reproduced accurately and used without unexpected errors. This is why software testing is sometimes called software quality control.

Without going too deeply into software test engineering, let's define some terms here as well:

  • Acceptance testing: Tests performed prior to introducing a model into the main build or testing process;
  • Integration testing: Tests for defects in the interfaces and interaction between units. Progressively larger groups of units may be integrated and tested;
  • Performance testing: Tests to verify that models can handle large quantities of data (sometimes referred to as Load Testing);
  • Regression testing: Tests performed to uncover functionality that has been previously working correctly but stops working as intended;* System testing: Checks that a completely integrated system meets its requirements;
  • Unit testing: Tests minimal software components, or modules. Each unit is checked tested to verify that the detailed design for the unit has been correctly implemented;
  • Usability testing: Verifies that user interfaces are easy to use and understand;
  • Verification: A determination that the product has been built according to its specifications;
  • Validation: A determination that the system meets its intended needs and that the specifications were correct.

Note the reappearance of the terms Verification and Validation here with subtly changed meanings. These activities close the product development loop in the same way as Validation and Verifiation close the model development loop. These tasks are similar but not identical and are most often performed by people in entirely different roles. In many cases, it seems, regression testing is confused with verification or validation. These are actually wildly different activities with divergent goals.

That said, there is absolutely nothing wrong with code reuse. It is possible, and desirable, to reuse tests done for model validation and verification in the software test domain. For example, it would be very useful to automate the test suite used to verify and validate a given model and use those tests as verification, validation and regression tests in the software test sense.

The deliverables for ns-3 model verification and validation will be something like web or wiki pages detailing what was behaviors have been validated. If a particlar behavior is verified or validated, the final output of the validation or verification test should be something like CONFORMS or DOES NOT CONFORM. On the other hand, the deliverables for a software test suite will be something like a PASS or FAIL indication. The same code can be used in both cases. If a model validation or verification test is incorporated into a nightly regression test, the output CONFORMS is interpreted as CONTINUES TO CONFORM, and the output DOES NOT CONFORM is interpreted as REGRESSION ERROR.

The ns-3 verification, validation and testing project will produce tools and environments that make it as easy as possible to create the various kinds of tests used in the sundry domains we have described. The frameworks will try to make it easy to reuse code in model-related and software test-related cases. We will also provide a number of examples to show how we think the different testing tasks should be done.

Kinds of Validation

The process used to validate a model is conceptually quite simple. One compares the behavior of the ns-3 model to the behavior of the target system, and makes adjustments to the abstract model and/or the ns-3 model to improve the correlation. This part of the process is sometimes called calibrating the model.

As mentioend above, at the end of the validation process, it is desirable to have a number of repeatable results that can demonstrate to other users that a given ns-3 model is faithful to a given abstract model and that this abstract model is, in turn, a faithful representation of the target system based on the initial qualification process. We call this collection of results the Validation Suite. You can think of this as the collection of experiments that have been used to validate a model, which can be run collectively as a suite of tests on the model for use by other users. The suite can also be run as part of the software test stragegy as mentioned above.

These validation suites can be composed of deterministic tests which are used to validate process-oriented models such as the TCP implementation of RFC 793 mentioned above, or stochastic tests which are used to validate physical processes. In both cases one wants to provide inputs to a model and observe that the outputs behave as expected. In the literature, Naylor and Finger call this piece of the puzzle, Validation of Input-Output Transformations.

Validating Models Using Stochastic Methods

In this case, the part of the target system to be validated is ultimately based on physical processes which are governed statistically, statistical comparisons between experiments done on the target system and simulations of experiments done on the ns-3 model will be in order in this case. These techniques might be used to validate the "not-so-slow PHY-level model of the 802.11a specification" example given above.

The goal is to compare the behavior of the target system to the ns-3 model in some set of ways. We must then identify some behavior, or observable, to be validated and then design an experiment to determine whether or not the ns-3 model behaves in a way that is consistent with the target system in that respect. We want to propose tests that the ns-3 model would fail if it were not operating consistenly with the target system. What does that really mean?

In the stochastic case, we are talking about a random variable -- a quantity that has no definite value, but rather has an ensemble of values that it can assume. This ensemble of values varies according to some probability distribution. For some set of conditions then, the measurements of the random variable taken on the target system will have some distribution with some number of moments such as the expectation value (mean) and a variance. If we run the same experiment on the ns-3 model under identical simulated conditions, measurements of the random variable will also have some distribution. In order to validate the ns-3 model, we need to demonstrate that measurements of the ns-3 model observable are drawn from the same distribution as measurements of the target system observable to some level of statistical significance. In other words, we are looking to support the null hypothesis (H0). The chi-squared test for goodness-of-fit is commonly used in such situations.

It seems that the description of the problem above leads us to conclude that the stochastic part of the ns-3 validation toolkit in ns-3 is really a framework for statistical analysis of experimental data. There are three basic pieces to the puzzle:

  • For an experiment/test, how does one collect and organize data collected from the target system and determine and specify the real distributions of the random variables;
  • For an experiment/test, how does one collect and organize data collected from the ns-3 model and determine and specify the simulated distributions of the random variables;
  • How does one actually perform the tests that determine whether the ns-3 model passes or fails (is consistent or not consistent with the target system).

We clearly don't want to get into the business of organizing experiments done on the target system, but we do need to figure out how to get information about the results of real experiments into the ns-3 validation framework as some form of reduced data description.

We do need to be able to run simulations in ns-3 in order to collect data generated by the ns-3 models under validation. This implies some kind of statistics gathering framework, perhaps like Joe Kopena's framework.

We need to be able to perform statistical analysis on the gathered data in order to reduce the data and we need to be able to perform various tests of statistical inference such as chi-square and least-squares fitting to do the null hypothesis testing.

It seems that, at a basic level, we are talking about:

  1. A data-gathering toolkit remniscent of the stats framework that allows us to run sets of experiments that generate data from the ns-3 models under test;
  2. A data-reduction toolkit that allows us to take the generated data and reduce it to some distribution with associated moments;
  3. A statistical analysis toolkit that allows us to make comparisons between an expected distribution and a measured distribution;
  4. A toolkit that allows for proper display of statistical data for inclusion in the web site that is the deliverable of the validation process;
  5. A testing framework that allows us to drive all of this automatically so we can use the validation test suite in the softwarae test environment.

Validating Models Using Deterministic Methods

In this case, the part of the target system to be validated is ultimately based on a logical specification. In this case, deterministic tests will be in order. These techniques might be used to validate the "TCP as defined by RFC 793" example given above.

The goal is to compare the behavior of the target system to the ns-3 model in some set of ways. We must then identify some behavior, or observable, to be validated and then design an experiment to determine whether or not the ns-3 model behaves in a way that is consistent with the target system in that respect. We want to propose tests that the ns-3 model would fail if it were not operating consistenly with the target system. What does that really mean?

In the deterministic case, we are talking about a repeatable, definite response. There are three basic pieces to the puzzle:

  • For an experiment/test, how does one determine the expected response of the system;
  • For an experiment/test, how does one capture the response of the ns-3 model;
  • How does one actually perform the tests that determine whether the ns-3 model passes or fails (is consistent or not consistent with the target system).

Craigdo 04:36, 3 April 2009 (UTC)