Bug 1927 – reduce the frequency of random variable stream generators test suite failures

Bug 1927 - reduce the frequency of random variable stream generators test suite failures


Summary:	reduce the frequency of random variable stream generators test suite failures

Status:	RESOLVED MOVED

Product:	ns-3
Classification:	Unclassified
Component:	core
Version:	pre-release
Hardware:	PC Linux

Importance:	P5 enhancement
Assigned To:	Tom Henderson

URL:
Keywords:

Duplicates:	1322 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-06-12 15:52 EDT by Tom Henderson
Modified:	2020-06-04 19:58 EDT (History)
CC List:	3 users (show)

See Also:	1322

Attachments
allow deterministic seeds (8.60 KB, patch) 2015-02-25 19:20 EST, Tom Henderson	Details \| Diff
TestCase rewrite (1.85 KB, patch) 2015-02-25 19:22 EST, Tom Henderson	Details \| Diff
Factor out chi-square test. (7.38 KB, patch) 2015-02-25 19:59 EST, Peter Barnes	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Henderson 2014-06-12 15:52:28 EDT

reported by Peter here:
http://mailman.isi.edu/pipermail/ns-developers/2014-June/012050.html

another possibility is to relax the tolerance of the test

Comment 1 Peter Barnes 2014-09-12 18:46:32 EDT

[Copied from the list]
It finally dawned on me how to drastically reduce the rate of these failures, which are intrinsically probabilistic.  There is a modest probability that any single execution of the test will fail.  IIRC the failure rate is ~5%.

Wrap the original test.  If it fails, retry.  If the second succeeds, pass the test.  

With 5% failure rate, the combined failure rate (two failures in a row) is 0.2%, or about once per year of daily runs.  You could extend the process to try three times, and reduce the rate to once every 22 years.  We probably want to limit the maximum number of retries, or we'll grind forever re-executing true failures.

This should be pretty simple to implement, but I'm really swamped and won't get to it for a while.  Anyone with a summer student?  Extra credit (and beer!) if you report the number of attempts in the pass/fail message (modify the test name?)

Comment 2 Tom Henderson 2014-09-12 19:43:02 EDT

I agree with the suggestion to wrap this.

Comment 3 Tom Henderson 2015-02-25 19:20:15 EST

We had a couple of test failures today on this issue, so I started to look at fixing it.

I looked into the idea of wrapping the test. It turns out not to be completely trivial to do this while preserving the use of test macros. I also was not sure about the value of trying to wrap vs. just changing the significance value of the chi-squared test. Running multiple chi-squared tests in the hopes that one succeeds will lessen the probability of failure but it is mixing unrelated techniques together.

I first came up with a patch to set the seeds only once, and to allow user to override the random setting of seeds if the global seed or run numbers are changed from the default; this will allow deterministically seeded runs of the test suite.

I next looked into wrapping the Uniform Random Variable chi-squared test. It seems like what was being done is to conduct N_RUNS, average the chi squared values obtained, and compare the average with the target. I wasn't sure where this methodology came from, so I decided to remove the current use of N_RUNS and let this variable instead set the number of retries of the underlying chi-squared test (without averaging). I also expanded the number of bins, fixed an off-by-one issue on the degrees of freedom, and specify it in terms of the Inverse P (rather than Q) CDF.

I ran the patched test for 10000 triels at N_RUNS=3 and saw no failures. I doubt that we'll even see them at N_RUNS=2.

If this is acceptable, I would do some similar adjustments of the chi-squared tests in the rest of the test suite, and check it in after testing. Any comments?

Comment 4 Tom Henderson 2015-02-25 19:20:59 EST

Created attachment 1976 [details]
allow deterministic seeds

Comment 5 Tom Henderson 2015-02-25 19:22:00 EST

Created attachment 1977 [details]
TestCase rewrite

similar pattern would be followed for other chi-squared tests

Comment 6 Peter Barnes 2015-02-25 19:59:28 EST

Created attachment 1978 [details]
Factor out chi-square test.

I got as far as factoring out the chi-square code so the implementation wasn't duplicated in each test case.  Patch attached.  I'll have to read Tom's suggestion when I can give it more time.

Comment 7 Tom Henderson 2015-03-19 23:43:26 EDT

(In reply to Peter Barnes from comment #6)
> Created attachment 1978 [details]
> Factor out chi-square test.
> 
> I got as far as factoring out the chi-square code so the implementation
> wasn't duplicated in each test case.  Patch attached.  I'll have to read
> Tom's suggestion when I can give it more time.

I couldn't get this to compile on ns-3-dev tonight, although I support the goal of this refactoring.

Comment 8 Tom Henderson 2015-08-31 15:03:45 EDT

I'd like to bump this for possible resolution because we still see these intermittent test failures.

I earlier proposed a patch to allow deterministic seeds and, in a second patch, rewrote the chi-squared code to wrap multiple runs, as discussed above.  Peter's patch went further in trying to refactor the chi-squared tests (but I couldn't get it to compile when I tried earlier).

I'd like to suggest to merge now at least my two patches and leave open this bug (possibly renamed) for Peter's refactoring (unless Peter has cycles to work on refactoring it further this week).

Comment 9 Peter Barnes 2015-08-31 18:18:37 EDT

+1

Comment 10 Tom Henderson 2015-09-02 00:09:52 EDT

The first patch on deterministic seed was pushed in changeset 11624:5ed58455441a.

After this commit, the following will always fail:
NS_GLOBAL_VALUE="RngRun=231" ./test.py -s random-variable-stream-generators

Comment 11 Tom Henderson 2015-09-02 00:33:50 EDT

The second patch was pushed in changeset 11625:07bfb4b5ad52

After this patch (which wraps the chi-squared test), the following test no longer fails (due to the wrapping):
NS_GLOBAL_VALUE="RngRun=231" ./test.py -s random-variable-stream-generators

This should help reduce the frequency of test suite failures in the buildslaves.

Leaving this bug open for possible refactoring of the chi-squared test started by Peter...

Comment 12 Peter Barnes 2019-11-04 17:32:26 EST

*** Bug 1322 has been marked as a duplicate of this bug. ***

Comment 13 Peter Barnes 2020-06-04 19:58:19 EDT

Moved to GitLab #220
https://gitlab.com/nsnam/ns-3-dev/-/issues/220