Bugzilla – Bug 1927
reduce the frequency of random variable stream generators test suite failures
Last modified: 2020-06-04 19:58:19 EDT
reported by Peter here: http://mailman.isi.edu/pipermail/ns-developers/2014-June/012050.html another possibility is to relax the tolerance of the test
[Copied from the list] It finally dawned on me how to drastically reduce the rate of these failures, which are intrinsically probabilistic. There is a modest probability that any single execution of the test will fail. IIRC the failure rate is ~5%. Wrap the original test. If it fails, retry. If the second succeeds, pass the test. With 5% failure rate, the combined failure rate (two failures in a row) is 0.2%, or about once per year of daily runs. You could extend the process to try three times, and reduce the rate to once every 22 years. We probably want to limit the maximum number of retries, or we'll grind forever re-executing true failures. This should be pretty simple to implement, but I'm really swamped and won't get to it for a while. Anyone with a summer student? Extra credit (and beer!) if you report the number of attempts in the pass/fail message (modify the test name?)
I agree with the suggestion to wrap this.
We had a couple of test failures today on this issue, so I started to look at fixing it. I looked into the idea of wrapping the test. It turns out not to be completely trivial to do this while preserving the use of test macros. I also was not sure about the value of trying to wrap vs. just changing the significance value of the chi-squared test. Running multiple chi-squared tests in the hopes that one succeeds will lessen the probability of failure but it is mixing unrelated techniques together. I first came up with a patch to set the seeds only once, and to allow user to override the random setting of seeds if the global seed or run numbers are changed from the default; this will allow deterministically seeded runs of the test suite. I next looked into wrapping the Uniform Random Variable chi-squared test. It seems like what was being done is to conduct N_RUNS, average the chi squared values obtained, and compare the average with the target. I wasn't sure where this methodology came from, so I decided to remove the current use of N_RUNS and let this variable instead set the number of retries of the underlying chi-squared test (without averaging). I also expanded the number of bins, fixed an off-by-one issue on the degrees of freedom, and specify it in terms of the Inverse P (rather than Q) CDF. I ran the patched test for 10000 triels at N_RUNS=3 and saw no failures. I doubt that we'll even see them at N_RUNS=2. If this is acceptable, I would do some similar adjustments of the chi-squared tests in the rest of the test suite, and check it in after testing. Any comments?
Created attachment 1976 [details] allow deterministic seeds
Created attachment 1977 [details] TestCase rewrite similar pattern would be followed for other chi-squared tests
Created attachment 1978 [details] Factor out chi-square test. I got as far as factoring out the chi-square code so the implementation wasn't duplicated in each test case. Patch attached. I'll have to read Tom's suggestion when I can give it more time.
(In reply to Peter Barnes from comment #6) > Created attachment 1978 [details] > Factor out chi-square test. > > I got as far as factoring out the chi-square code so the implementation > wasn't duplicated in each test case. Patch attached. I'll have to read > Tom's suggestion when I can give it more time. I couldn't get this to compile on ns-3-dev tonight, although I support the goal of this refactoring.
I'd like to bump this for possible resolution because we still see these intermittent test failures. I earlier proposed a patch to allow deterministic seeds and, in a second patch, rewrote the chi-squared code to wrap multiple runs, as discussed above. Peter's patch went further in trying to refactor the chi-squared tests (but I couldn't get it to compile when I tried earlier). I'd like to suggest to merge now at least my two patches and leave open this bug (possibly renamed) for Peter's refactoring (unless Peter has cycles to work on refactoring it further this week).
+1
The first patch on deterministic seed was pushed in changeset 11624:5ed58455441a. After this commit, the following will always fail: NS_GLOBAL_VALUE="RngRun=231" ./test.py -s random-variable-stream-generators
The second patch was pushed in changeset 11625:07bfb4b5ad52 After this patch (which wraps the chi-squared test), the following test no longer fails (due to the wrapping): NS_GLOBAL_VALUE="RngRun=231" ./test.py -s random-variable-stream-generators This should help reduce the frequency of test suite failures in the buildslaves. Leaving this bug open for possible refactoring of the chi-squared test started by Peter...
*** Bug 1322 has been marked as a duplicate of this bug. ***
Moved to GitLab #220 https://gitlab.com/nsnam/ns-3-dev/-/issues/220