Bug 555 – DCF immediate access bug

Bug 555 - DCF immediate access bug


Summary:	DCF immediate access bug

Status:	RESOLVED FIXED

Product:	ns-3
Classification:	Unclassified
Component:	wifi
Version:	ns-3-dev
Hardware:	All All

Importance:	P2 critical
Assigned To:	Nicola Baldo

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2009-04-24 03:52 EDT by TimoB
Modified:	2016-09-28 00:15 EDT (History)
CC List:	9 users (show)

See Also:	2369

Attachments
DCF-collision-bug.patch (1.98 KB, patch) 2009-04-24 03:52 EDT, TimoB	Details \| Diff
DCF collision (10.85 KB, patch) 2009-05-11 11:10 EDT, TimoB	Details \| Diff
DCF collision (14.82 KB, patch) 2009-06-08 11:08 EDT, TimoB	Details \| Diff
New patch to work with revision 5762:ae78a8de6f5f (18.25 KB, patch) 2009-11-20 10:49 EST, Faker Moatamri	Details \| Diff
Updated patch against 5768:a07d60db8448 (18.22 KB, patch) 2009-11-23 13:16 EST, TimoB	Details \| Diff
delay StartAccessIfNeeded (4.37 KB, patch) 2010-11-08 18:58 EST, Nicola Baldo	Details \| Diff
schedule EndTxNoAck in MacLow (5.24 KB, patch) 2010-11-10 13:31 EST, Nicola Baldo	Details \| Diff
tentative test case (4.44 KB, patch) 2010-12-21 12:57 EST, Nicola Baldo	Details \| Diff
test program (5.37 KB, text/x-c++src) 2010-12-21 16:10 EST, Nicola Baldo	Details
Short program showing non-uniform distribution of backoff (5.02 KB, text/x-c++src) 2011-02-28 23:41 EST, Quincy Tse	Details
added slot counting to the original test program (6.94 KB, text/x-c++src) 2011-03-29 13:46 EDT, Nicola Baldo	Details
new program working with latest ns-3-dev (5.09 KB, text/x-c++src) 2011-04-10 15:50 EDT, Nicola Baldo	Details
Quincy's short program showing non-uniform distribution of backoff (5.17 KB, text/x-c++src) 2012-04-25 18:37 EDT, Tom Henderson	Details
revised test program (6.02 KB, text/x-c++src) 2012-04-26 12:48 EDT, Nicola Baldo	Details
log produced by the test program (13.90 KB, text/plain) 2012-04-26 12:51 EDT, Nicola Baldo	Details
slightly revised test program (7.17 KB, text/x-c++src) 2012-08-30 11:55 EDT, Tom Henderson	Details
yet another revised test program (7.18 KB, text/x-c++src) 2012-10-30 11:27 EDT, Nicola Baldo	Details
schedule EndTxNoAck in MacLow (revised for current ns-3-dev) (5.19 KB, patch) 2012-11-02 10:27 EDT, Daniel L.	Details \| Diff
Test program demonstrating problem still exists (5.79 KB, text/plain) 2013-08-07 02:45 EDT, Quincy Tse	Details
Show Obsolete (14) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description TimoB 2009-04-24 03:52:29 EDT

Created attachment 427 [details]
DCF-collision-bug.patch

DcfManager handes immediate access to the medium incorrectly:
That the backoff counter == 0 is an incorrect criterion, the backoff end must have elapsed.

This occurs for two subsequenct unACKed frames, where the second gets 0 backoff time. Then the access mechanisms incorrectly signals an immediate collision instead of waiting AIFS.

Furthermore, immediate medium access should be possible at simulator start, because all backoffs should be regarded as elapsed. This is easily fixed by setting negative start times.

The patch applies these fixes. It modifies wifi-test.cc validation and two ref-traces.

Comment 1 Mathieu Lacage 2009-04-24 04:33:04 EDT

(In reply to comment #0)
> Created an attachment (id=427) [details]
> DCF-collision-bug.patch
> 
> DcfManager handes immediate access to the medium incorrectly:
> That the backoff counter == 0 is an incorrect criterion, the backoff end must
> have elapsed.
> 
> This occurs for two subsequenct unACKed frames, where the second gets 0 backoff
> time. Then the access mechanisms incorrectly signals an immediate collision
> instead of waiting AIFS.
> 
> Furthermore, immediate medium access should be possible at simulator start,
> because all backoffs should be regarded as elapsed. This is easily fixed by
> setting negative start times.
> 
> The patch applies these fixes. It modifies wifi-test.cc validation and two
> ref-traces.

I think that you are saying that this breaks validation of dcf-manager-test.cc. If so, you need to provide a patch to fix it. Adding a testcase to DcfManagerTest which demonstrates the old buggy behavior and the new correct behavior would also be needed.

I should point out that there are a couple of unneeded changes in this patch (Changing MicroSeconds (0) to Seconds (0.0)): This piece of code is really tricky and critical so, patches to it should really really be minimal: if you wish to make non-functional changes later to change MicroSeconds to Seconds, that is fine with me.

Comment 2 Mathieu Lacage 2009-04-24 04:39:26 EDT

(In reply to comment #0)
> Created an attachment (id=427) [details]
> DCF-collision-bug.patch
> 
> DcfManager handes immediate access to the medium incorrectly:
> That the backoff counter == 0 is an incorrect criterion, the backoff end must
> have elapsed.
> 
> This occurs for two subsequenct unACKed frames, where the second gets 0 backoff
> time. Then the access mechanisms incorrectly signals an immediate collision
> instead of waiting AIFS.

I am sorry but I really don't see what is the relationship between a 0 backoff time and an immediate collision (I would expect that if the backoff is zero, there is no immediate collition). To re-iterate my previous comment, you really need to first provide a testcase for that bug in dcf-manager-test.cc with your fix.

> 
> Furthermore, immediate medium access should be possible at simulator start,
> because all backoffs should be regarded as elapsed. This is easily fixed by
> setting negative start times.

This is really an arbitrary decision and I don't believe it matters in any way so, if I had to choose between the current behavior and the new behavior, I would pick the one which minimizes the number of changes to the code. Either way, you really have to leave this change out of what you describe as a more real bug for 2 subsequent unacked frames.

Comment 3 TimoB 2009-05-11 11:09:13 EDT

Okay, the situation is even more complex than I thought.

I've extended DcfManagerTest to pinpoint the bug. To do that I had to add alot of code, because DcfManagerTest didnt perform a backoff for successful transmission. So that added could write the test case for the bug:

  // Test case where two packets are queued at DcaTxop, the first send starts
  // immediately and the second is also immediately indicated but has to wait
  // for its backoff.
  //
  //  20          60     66      70   80
  //   | tx        | sifs | aifs  | tx |
  //   |20: request access for both packets

Where the backoff of the second tx is 0. In original ns-3-dev, this test yields a collision for the second tx at 20, where no collision should be indicated, because the medium is busy from the first tx.

The DcfManagerTest patch contains a second test case with backoff=2 which doesnt throw a collision, to highlight the problem with =0.

The patch I previously submitted creates more problems than fixes.

I've added a new patch, which adds a bool m_backoffElapsed to DcfState. I have not found a way to do it without a new attribute. With the new patch the added test cases pass ok.

See below about the simulator start.

(In reply to comment #2)
> > 
> > Furthermore, immediate medium access should be possible at simulator start,
> > because all backoffs should be regarded as elapsed. This is easily fixed by
> > setting negative start times.
> 
> This is really an arbitrary decision and I don't believe it matters in any way
> so, if I had to choose between the current behavior and the new behavior, I
> would pick the one which minimizes the number of changes to the code. Either
> way, you really have to leave this change out of what you describe as a more
> real bug for 2 subsequent unacked frames.

Yes, it is independent from the first issue. Of course "no changes" are always the mimimum number of changes :). It's not really an arbitrary decision to make simulator start a special point in time for backoff management.

Comment 4 TimoB 2009-05-11 11:10:06 EDT

Created attachment 438 [details]
DCF collision

Comment 5 TimoB 2009-06-08 11:08:21 EDT

Created attachment 461 [details]
DCF collision

Updated for current ns-3-dev.
The problem still persists. I also corrected the last of the new ACK test cases.

Comment 6 Mathieu Lacage 2009-06-23 07:38:16 EDT

kirill, I noticed that this patch is changing one of the tests you added for ack timeout:

// this situation is not supposed to fire an internal or external  collison.                                                             
// ExpectCollision (39, 2, 0); // backoff: 2 slot                                        

what do you think about this change ?

Comment 7 Kirill V. Andreev 2009-06-23 07:53:19 EDT

(In reply to comment #6)
> kirill, I noticed that this patch is changing one of the tests you added for
> ack timeout:
> 
> // this situation is not supposed to fire an internal or external  collison.    
> // ExpectCollision (39, 2, 0); // backoff: 2 slot                               
> 
> what do you think about this change ?
> 
Ok, I agree with this changes: we should just wait till medium becomes idle rather than think that it was collision.

Comment 8 Mathieu Lacage 2009-06-23 08:02:14 EDT

I have to confess that I really don't understand the purpose of this test:

  // Test case where two packets are queued at DcaTxop, the first send starts   
  // immediately and the second is also immediately indicated but has to wait   
  // for its backoff.                                                           
  //                                                                            
  //  20          60     66      70   80                                        
  //   | tx        | sifs | aifs  | tx |                                        
  //   |20: request access for both packets                                     
  StartTest (4, 6 , 10);
  AddDcfState (1);
  AddAccessRequest (20, 40, 20, 0, 0);
  AddAccessRequest (20, 10, 70, 0, 0);
  EndTest ();

If I read the code of dca-txop.cc correctly, I think that this set of access
requests is impossible. i.e., DcaTxop::StartAccessIfNeeded appears to be
checking for IsAccessRequested before calling RequestAccess so, it should not
be possible for a specific queue to request access again before the first
access has been resolved. 

Timo, could you explain what you are trying to test here and why you think that
the above is a sequence of events which can happen ?

Comment 9 TimoB 2009-06-24 05:34:41 EDT

(In reply to comment #8)
> Timo, could you explain what you are trying to test here and why you think that
> the above is a sequence of events which can happen ?

Yes, it is true that RequestAccess() is called exactly once for each transmission. However it can still happen, that it is called twice at the same simulated time point.

Consider a DcaTxop with two packets Enqueued(), both are put into the queue and RequestAccess() is called. When NotifyAccessGranted() is invoked, the first packet transmission starts immediately via MacLow. Still in the same procedure StartAccessIfNeeded() is called in NotifyAccessGranted() and RequestAccess() is signaled for the second packet.

Thus two RequestAccess() happen the same time, if the medium is idle for the first one and transmission can start immediately.

Comment 10 Mathieu Lacage 2009-06-24 05:44:34 EDT

(In reply to comment #9)
> (In reply to comment #8)
> > Timo, could you explain what you are trying to test here and why you think that
> > the above is a sequence of events which can happen ?
> 
> Yes, it is true that RequestAccess() is called exactly once for each
> transmission. However it can still happen, that it is called twice at the same
> simulated time point.
> 
> Consider a DcaTxop with two packets Enqueued(), both are put into the queue and
> RequestAccess() is called. When NotifyAccessGranted() is invoked, the first
> packet transmission starts immediately via MacLow. Still in the same procedure
> StartAccessIfNeeded() is called in NotifyAccessGranted() and RequestAccess() is
> signaled for the second packet.
> 
> Thus two RequestAccess() happen the same time, if the medium is idle for the
> first one and transmission can start immediately.
> 

So, if I understand you well, you are saying that, in this case, we bogusly generate a collision, right ?

Comment 11 TimoB 2009-06-24 06:08:15 EDT

> So, if I understand you well, you are saying that, in this case, we bogusly
> generate a collision, right ?

The bug is that there should be _no_ collision indicated.

Without the patch this situation signals a collision if backoff == 0 for the second packet.

The bad condition is: IsBusy() and backoff == 0. Occurs because the first packet obviously is in TX and if the second gets backoff == 0, a collision is raised. And that is wrong.

Comment 12 Mathieu Lacage 2009-06-25 02:38:18 EDT

Would it not be possible to instead fix this bug my adding an end of tx signal to MacLow, forward it to DcaTxop, and invoke RequestAccessIfNeeded from there at the end of tx for broadcast/multicast messages ?

This would add yet another event, but, at least, it seems pretty straightforward: the way the current patch works makes my brain hurt.

Comment 13 TimoB 2009-06-25 13:05:09 EDT

(In reply to comment #12)
> Would it not be possible to instead fix this bug my adding an end of tx signal
> to MacLow, forward it to DcaTxop, and invoke RequestAccessIfNeeded from there
> at the end of tx for broadcast/multicast messages ?
> 
> This would add yet another event, but, at least, it seems pretty
> straightforward: the way the current patch works makes my brain hurt.

No, it does not change anything about the bug reported.
If you delay RequestAccessIfNeeded() to after the TX the test case changes to:

  // Test case where two packets are queued at DcaTxop, the first send starts
  // immediately and the second is also immediately indicated but has to wait
  // for its backoff.
  //
  //  20          60     66      70   80
  //   | tx        | sifs | aifs  | tx |
  //   |20: request access for first packet
  //               |60: request access for second packet

and we still need to differentate this case from

  //  20          60     66              70      75
  //   | tx        | sifs | aifs+backoff  | idle | tx |
  //   |20: request access for first packet
  //                                             |75: request access for second packet

I don't see what's wrong with adding a "backoff elapsed" flag to DcaTxop.

Comment 14 Craig Dowell 2009-10-02 14:18:19 EDT

On the critical path for ns-3.7

Comment 15 Faker Moatamri 2009-11-20 10:49:55 EST

Created attachment 665 [details]
New patch to work with revision 5762:ae78a8de6f5f

This is an updated patch that should apply to revision 5762:ae78a8de6f5f of ns-3-dev. I disabled the last two tests in the dcf-manager-test.cc because they kept on failing.

Comment 16 TimoB 2009-11-23 13:16:03 EST

Created attachment 670 [details]
Updated patch against 5768:a07d60db8448

Updated by pulling into old hg repository.

Main problem with your patch was:
+  NS_TEST_EXPECT_MSG_EQ (Simulator::Now (), MicroSeconds (expected.m_expectedGrantTime), "Expected access grant is now");
-  NS_TEST_EXPECT_MSG_EQ (Simulator::Now (), MicroSeconds (expected.second), "Expected access grant is now");

wrong value there.

Updated everything else. The updated test methods allow (require) specification of the backoff slots after each AIFS.

Timo

Comment 17 Nicola Baldo 2010-05-14 10:57:15 EDT

(In reply to comment #12)
> Would it not be possible to instead fix this bug my adding an end of tx signal
> to MacLow, forward it to DcaTxop, and invoke RequestAccessIfNeeded from there
> at the end of tx for broadcast/multicast messages ?
> 
> This would add yet another event, but, at least, it seems pretty
> straightforward: the way the current patch works makes my brain hurt.


I favor this approach (add one end-of-tx notification) rather than the previously proposed patch (the backoffElapsed approach), because it is a cleaner solution. Any comments?

Comment 18 Nicola Baldo 2010-11-01 01:56:22 EDT

I spent more time examining this issue, and my conclusion is the same as before: Mathieu's proposal of delaying RequestAccessIfNeeded will work. This is my reasoning:

1) normally there is always a signal generated when a frame exchange terminates  (e.g., received ACK, missed ACK, missed CTS...)

2) in this case, RequestAccessIfNeeded is called only *after* the frame exchange is terminated

3) this is not true for broadcast packets: in this case, we have no signal at the *end* of the frame exchange. Unlike all other types of frame exchanges, RequestAccessIfNeeded is called *before* the frame exchange is terminated. Hence the DcfState is *always* busy (due to the broadcast still being transmitted), and whenever a zero backoff is generated a collision will result.

4) by generating an EndOfBroadcast event, RequestAccessIfNeeded will be called correctly, i.e., after the previous frame exchange has terminated. 

I'll try to post a patch for the proposed solution as soon as possible.

Comment 19 Nicola Baldo 2010-11-08 18:58:57 EST

Created attachment 1010 [details]
delay StartAccessIfNeeded

Comment 20 Nicola Baldo 2010-11-09 07:52:48 EST

(In reply to comment #19)
> Created an attachment (id=1010) [details]
> delay StartAccessIfNeeded

Unfortunately that patch won't work. The WifiPhy needs to be involved in order to schedule the EndTxNoAck event properly.

Comment 21 Mathieu Lacage 2010-11-09 09:10:27 EST

maybe it would be time to give up on our optimization to avoid the end-of-tx event at the phy layer and schedule this event all the time

Comment 22 Nicola Baldo 2010-11-09 10:35:03 EST

(In reply to comment #21)
> maybe it would be time to give up on our optimization to avoid the end-of-tx
> event at the phy layer and schedule this event all the time

I agree.

Comment 23 Nicola Baldo 2010-11-10 13:29:00 EST

thinking more about it, adding a TxEnd event on the PHY would actually require a major refactoring of the PHY state machine. This would require a lot of time and effort, which unfortunately I don't have at the moment.

So, I've opened the new bug 1028 to tackle this refactoring.

As for bug 555 (this bug), if any temporary fix exists which is not too ugly, I would be fine with pushing it. Otherwise, bug 555 will need to wait a lot of time until 1028 is fixed.

Comment 24 Nicola Baldo 2010-11-10 13:31:49 EST

Created attachment 1012 [details]
schedule EndTxNoAck in MacLow

New proposed patch where the EndTxNoAck event is scheduled in MacLow.

Of course this would need to be changed when bug 1028 is fixed, but for now it could be an acceptable fix.

This patch needs testing, some help would be very much appreciated.

Comment 25 Nicola Baldo 2010-12-21 12:57:02 EST

Created attachment 1017 [details]
tentative test case

Comment 26 Nicola Baldo 2010-12-21 16:10:24 EST

Created attachment 1018 [details]
test program

this program should trigger the bug, but in practice it doesn't...

Comment 27 Quincy Tse 2011-02-28 23:41:16 EST

Created attachment 1038 [details]
Short program showing non-uniform distribution of backoff

(In reply to comment #24)
> This patch needs testing, some help would be very much appreciated.

I'm getting an assertion failure after applying this patch to NS-3.10.

assert failed. cond="m_backoffSlots == 0", file=../src/devices/wifi/dcf-manager.cc, line=109

I'm not sure whether this is due to the reorganisations of the MAC 6673/6674/6713.

Attached is a program that measures the number of back off slots based on the measurements from the channel ((time between transmissions - DIFS)/slot time) when there is only one node transmitting (ie. measures the distribution of chosen back off slots).

Non patched code - gives a very small (non-zero) count of backoff = 0.

With patch (using simple patch -p1 < patch and also tried manually) - I get assert failure.

Comment 28 Quincy Tse 2011-03-02 22:59:21 EST

(In reply to comment #27)
> I'm not sure whether this is due to the reorganisations of the MAC
> 6673/6674/6713.

Applied patch to NS-3.9 - assert failure still there.

Tried commenting out the NS_ASSERT - results shows that distribution of (successful) backoff values are heavily skewed to the high side. (Without the patch, we get a tiny number for 0-slot and the rest uniformly distributed across the other slot values.

Comment 29 Nicola Baldo 2011-03-29 13:46:09 EDT

Created attachment 1054 [details]
added slot counting to the original test program

Comment 30 Nicola Baldo 2011-04-10 15:50:53 EDT

Created attachment 1065 [details]
new program working with latest ns-3-dev

Comment 31 Tom Henderson 2011-06-25 11:10:35 EDT

(In reply to comment #30)
> Created attachment 1065 [details]
> new program working with latest ns-3-dev

Nicola, can you please provide status on this bug; should it be closed or worked/tested further?

Comment 32 Nicola Baldo 2011-07-01 12:41:54 EDT

It should be tested further. The behavior reported by Quincy should be understood. And we should be finally able to either reproduce the original bug (i.e.: two subsequent broadcast packets, the second one gets a backoff=0, and gets discarded because of virtual collision), or declare that it does not occur. 

Unfortunately, as of now, I don't have available development time to investigate this issue.

Comment 33 Tom Henderson 2012-04-25 18:15:48 EDT

Here is a brief status update, after discussing this with Gary Pei.

We repeated Quincy's two observations:
1) unpatched ns-3-dev exhibits too low number of packets with backoff == 0
2) patched ns-3-dev asserts, and without assertion, the distribution of backoff is skewed

So Nicola's patch to schedule EndTxNoAck is suspect, and even the existing behavior (why backoff == 0 is so low) needs to be understood and perhaps improved.

Regarding the original bug report:
1) virtual collision detected on two subsequent unACKed frames
2) can't access medium immediately at time 0

The test case proposed in dcf-manager-test.cc will always generate a collision event, even if the EndTxNoAck patch works, because the DCF is being artificially stimulated by requesting access in an artificial way.  It is probably better to create a separate test or test case that tries to use DCF in the normal way.

The following test is proposed:
- construct a single node simulation
- schedule two back-to-back broadcast frames via a packet socket
- fail if a collision is detected
- succeed if the frames are transmitted over the air as expected

If the EndTxNoAck event works correctly, then request access should not be called in this case and we should instead observe that the first frame is backed off and transmitted, and the second frame follows (with its own backoff).

We could also use the same test case to test whether medium could be accessed at time 0 by picking the random seed such that the first backoff chosen is zero slots.

So, in summary, it seems like the way forward is to construct the failing test case mentioned above, and to understand why Quincy's program is failing (and turn this into a test that enforces that a uniform distribution is obtained).  Then to work on Nicola's patch to try to make both tests eventually pass.

Comment 34 Tom Henderson 2012-04-25 18:30:25 EDT

(In reply to comment #33)

> 
> The following test is proposed:
> - construct a single node simulation
> - schedule two back-to-back broadcast frames via a packet socket
> - fail if a collision is detected
> - succeed if the frames are transmitted over the air as expected
> 
> If the EndTxNoAck event works correctly, then request access should not be
> called in this case and we should instead observe that the first frame is
> backed off and transmitted, and the second frame follows (with its own
> backoff).
> 
> We could also use the same test case to test whether medium could be accessed
> at time 0 by picking the random seed such that the first backoff chosen is zero
> slots.

I should mention that Nicola's test-bug-555.cc program can be stripped down and stepped through for the two packet test proposed above.  It may be that the EndTxNoAck is not needed anymore for some reason (Nicola's test-bug-555.cc seems to pass now as is), in which case we just need to understand Quincy's test program results and turn it into a test, close this, and decide on leaving bug 1028 open or not.

Comment 35 Tom Henderson 2012-04-25 18:37:33 EDT

Created attachment 1390 [details]
Quincy's short program showing non-uniform distribution of backoff

Updated from attachment 1065 [details] to account for the default attributes being in class ns3::Dcf.

Not sure whether SetStandard() should be called to override attribute defaults; see the main() function.

Comment 36 Nicola Baldo 2012-04-26 12:48:58 EDT

Created attachment 1391 [details]
revised test program

At last, here is a revised version of the test program that reproduces the bug. 

It basically does what proposed by Tom, with the only exception that it does not report failure. The reason is that virtual collisions do not result in a lost packet, and there is no trace source related to virtual collisions. Hence you need to look at the logging of DcfManager to see that there is a virtual collision.

Comment 37 Nicola Baldo 2012-04-26 12:51:15 EDT

Created attachment 1392 [details]
log produced by the test program

attached log of DcfManager produced with the test program; just search for "collision" to see the buggy behavior.

Comment 38 Quincy Tse 2012-06-06 00:17:58 EDT

Regarding that unusually low 0-slot behaviour, I've hacked through a bit of combinatorial maths. The behaviour observed when I add more than 1 node (comparing against theoretical predictions) appears as though the contention window length is reduced by 1 and shifted by 1. (ie. The packet is just not getting sent immediately.)

eg. for a scenario of 3 saturated nodes and CW of 7, the resultant idle slot distribution matches the theoretical results for a scenario of 3 nodes and CW of 6, with the slots shifted right by 1.

The table below compares the behaviour against the theoretical results for a right-shifted-and-shortened-cw (as described above), and those for the case where the DCF never (or rarely) choose slot 0 (ie. when backoff counter is reset, it is reset to anything but 0.):

idle  --- Observation probabilities (%)
slots --- right-shifted+1 --- observed --- Never-0
  0             --              2%           17%
  1             50%            53%           29%
  2             28%            27%           21%
  3             14%            11%           15%
  4              6%             5%           10%
  5              2%             2%            5%
  6             .4%            .1%            2%
  7              0%             0%           .5%

Comment 39 Quincy Tse 2012-06-06 00:25:07 EDT

(In reply to comment #38)
> The behaviour observed when I add more than 1 node (comparing against  heoretical predictions)...

I should also clarify that these observations were made on ns3.10. I'll try running the multi-transmitter code in the latest -dev before uploading the code for it.

Comment 40 Nicola Baldo 2012-06-06 05:33:14 EDT

Quincy, in your sim program you have saturated traffic and you measure the backoff by connecting to the WifiPhy State trace. My understanding is that, because of the buggy behavior that I confirmed in my previous post, most packets with a 0 backoff face a virtual collision at the MAC, hence they undergo a further backoff before actually being delivered to the PHY. I think this explains the skew in the backoff distribution that you observed.

Comment 41 Nicola Baldo 2012-06-06 05:35:19 EDT

Changing status to PATCH WANTED, since the previous patches were reported not to be satisfactory.

Comment 42 Tom Henderson 2012-08-30 11:54:33 EDT

After spending some more time on this yesterday, using ns-3-dev code, I'd like to report the following.

I'm not able to reproduce the collision log posted by Nicola on 2012-04-26. Note that the underlying random variable to stream mapping has changed recently, but I iterated through many Rng run numbers including those that gave an initial backoff of 0 and I could not find collisions.

I'm not able to reproduce these collisions using the test program that Nicola posted, either by using different run numbers or by increasing the number of consecutive packets. All packets are successfully received.

I looked at the logs carefully and I can see that the behavior follows the following sequence (which I think is correct for broadcast frames):

frame transmission, (SIFS + AIFS + backoff), frame transmission, (SIFS+AIFS+backoff), etc.

I stepped through this part of the code (interaction between DcaTxop and DcfManager) and it seems to do the right thing.

I did not explore the distribution of chosen backoff slots (for the 1 node case) but I noticed that this is just calling GetInteger on a random variable (ranging from 0 to CW) and I didn't notice any skews in this when I looked at several different run numbers. Now, Quincy's original observations were based on looking at channel output, and there may be other interactions going on that are skewing the backoff statistics as observed on the channel, but I don't think they are coming from the DcfManager.

As I mentioned earlier, I think that Timo's original test case is not that relevant to the real operation that involves DcaTxop requesting access from DcfManager.

So, to summarize, I propose the following. If people agree that the above sequence of events is correct for broadcast, I volunteer to check in a modified version of the attached testcase as a regression test, and to validate that the distribution of backoff slots at the DcfManager is uniform, and to close this as RESOLVED INVALID

Comment 43 Tom Henderson 2012-08-30 11:55:45 EDT

Created attachment 1441 [details]
slightly revised test program

This is essentially Nicola's test program with a callback that is testing the transmission times of the default two-packet case.

Comment 44 Daniel L. 2012-10-24 11:56:02 EDT

Regarding Nicola's patch in comment 24 (schedule EndTxNoAck in MacLow), would it create a problem if we move m_currentPacket = 0 from DcaTxop::NotifyAccessGranted to DcaTxop::EndTxNoAck?

I tried that with the test program and the timing seems correct:

frame transmission, (SIFS + AIFS + backoff), frame transmission, (SIFS+AIFS+backoff), etc

The actual backoff experienced by the second packet seems to be uniform between 0 to 15 as well.

Previously second packets with backoff = 0 (e.g. RngRun=10) got virtual collision and re-selects the backoff again. Resulting in only a few packets having backoff = 0.

Comment 45 Nicola Baldo 2012-10-30 11:27:07 EDT

Created attachment 1460 [details]
yet another revised test program

I took Tom's latest program and just added the code to enforce a 0 backoff in DcaTxop. In this way I get the virtual collision:

1s -1 DcfManager:RequestAccess(): +1000000000.0ns 0x2568290 medium is busy: collision
1s -1 DcfManager:StartBackoffNow(): +1000000000.0ns 0x25684c0 start backoff=0 slots

for the record, I used changeset 9114:41bfd88f8055

Comment 46 Daniel L. 2012-11-02 10:27:40 EDT

Created attachment 1461 [details]
schedule EndTxNoAck in MacLow (revised for current ns-3-dev)

I took Nicola's patch from comment #24 and apply it to the current ns-3-dev. I moved "m_currentPacket = 0;" from DcaTxop::NotifyAccessGranted to DcaTxop::EndTxNoAck.

With this patch, the second packet has backoff slot uniformly from 0 to 15 slots. No virtual collision if slot 0 is selected.

I'm not sure if moving m_currentPacket = 0;" from DcaTxop::NotifyAccessGranted to DcaTxop::EndTxNoAck will cause any unexpected effect. Any thought on this?

Comment 47 Tom Henderson 2012-11-02 18:35:17 EDT

(In reply to comment #42)

> 
> I'm not able to reproduce the collision log posted by Nicola on 2012-04-26. 
> Note that the underlying random variable to stream mapping has changed
> recently, but I iterated through many Rng run numbers including those that gave
> an initial backoff of 0 and I could not find collisions.
> 

I see why I did not reproduce; I was using "--RngRun=number" option to set the run number, but there is no command line argument parsing in my test program.  Using NS_GLOBAL_VALUE=RngRun=10 will trigger it.

Comment 48 Tom Henderson 2012-11-02 18:36:31 EDT

(In reply to comment #46)
> Created attachment 1461 [details]
> schedule EndTxNoAck in MacLow (revised for current ns-3-dev)
> 
> I took Nicola's patch from comment #24 and apply it to the current ns-3-dev. I
> moved "m_currentPacket = 0;" from DcaTxop::NotifyAccessGranted to
> DcaTxop::EndTxNoAck.
> 
> With this patch, the second packet has backoff slot uniformly from 0 to 15
> slots. No virtual collision if slot 0 is selected.
> 
> I'm not sure if moving m_currentPacket = 0;" from DcaTxop::NotifyAccessGranted
> to DcaTxop::EndTxNoAck will cause any unexpected effect. Any thought on this?


I tested this lightly today; it seems OK at first glance and only causes the routing-aodv-regression (traces) to diverge.  All of the wifi tests pass.

I think we should try more testing on this and if no problems found, try to get it merged for ns-3.16.

Comment 49 Daniel L. 2012-11-05 11:06:27 EST

(In reply to comment #48)
> (In reply to comment #46)
> > Created attachment 1461 [details]
> > schedule EndTxNoAck in MacLow (revised for current ns-3-dev)
> > 
> > I took Nicola's patch from comment #24 and apply it to the current ns-3-dev. I
> > moved "m_currentPacket = 0;" from DcaTxop::NotifyAccessGranted to
> > DcaTxop::EndTxNoAck.
> > 
> > With this patch, the second packet has backoff slot uniformly from 0 to 15
> > slots. No virtual collision if slot 0 is selected.
> > 
> > I'm not sure if moving m_currentPacket = 0;" from DcaTxop::NotifyAccessGranted
> > to DcaTxop::EndTxNoAck will cause any unexpected effect. Any thought on this?
> 
> 
> I tested this lightly today; it seems OK at first glance and only causes the
> routing-aodv-regression (traces) to diverge.  All of the wifi tests pass.
> 

I'm looking at the routing-aodv-regression test to make sure that the differences were due to the original tests have bogus virtual collisions. And that the proposed fix doesn't change other things. As far as I can see now, the random number used by the routing-aodv-regression results in bogus virtual collisions. I will try with other random numbers.

> I think we should try more testing on this and if no problems found, try to get
> it merged for ns-3.16.

Comment 50 Daniel L. 2012-11-06 16:07:26 EST

> I'm looking at the routing-aodv-regression test to make sure that the
> differences were due to the original tests have bogus virtual collisions. And
> that the proposed fix doesn't change other things. As far as I can see now, the
> random number used by the routing-aodv-regression results in bogus virtual
> collisions. I will try with other random numbers.

I checked the failing routing-aodv-regression test. The bug-772 test fails when TCP is used. I believe the proposed patch didn't cause the problem. The random number used by the original test results in a bogus virtual collision when backoff=0. Since no virtual collision is present after the patch is applied, all results diverge after that point.

Another thing about the test (if someone wants to look into this), the patched code results in a very low number of packets received at the receiver. This is not the problem caused by the changed behavior. At one point in the simulation, an ARP request from one node is not received by the node with the matched IP address (it's a broadcast frame). After that point, an ACK from the receiver cannot be sent back to the sender, causing TCP to wait for timeout.

Bug-772 test doesn't fail when UDP is used since no bogus virtual collision occurred in the original test.

I'm not sure if the patched code invalidates the bug-772 test or not. Anyone has any thought on this?

Comment 51 Nicola Baldo 2012-11-12 12:05:41 EST

I was going to CC the AODV maintainer about this, but then I realized that we don't have an AODV maintainer any more...

Comment 52 Tom Henderson 2012-11-12 14:24:15 EST

(In reply to comment #51)
> I was going to CC the AODV maintainer about this, but then I realized that
> we don't have an AODV maintainer any more...

Daniel volunteered to regenerate the test case and confirm that the behavior is correct with the patched code (he reported that he already has looked at this and that the patched code behaves correctly in the test case).

Comment 53 Nicola Baldo 2012-12-06 13:44:07 EST

(In reply to comment #46)
> 
> I'm not sure if moving m_currentPacket = 0;" from
> DcaTxop::NotifyAccessGranted to DcaTxop::EndTxNoAck will cause any
> unexpected effect. Any thought on this?

I had a look at the code with this in mind and the movement of "m_currentPacket = 0" to EndTxNoAck makes sense to me, it's consistent with the fact that in the current code for non-broadcast packets the same instruction is always done at the end of the frame exchange sequence. 

That said, and considered the additional tests done by Daniel, I think it is ok push this patch and close this bug.

Thanks Daniel for your effort!

Comment 54 Quincy Tse 2013-08-07 02:45:15 EDT

Created attachment 1652 [details]
Test program demonstrating problem still exists

Uploaded new version of program demonstrating there is still a lower-than-expected likelihood of immediate DCF access.

Test program simulates 3 saturated nodes and shows that most packets experiences at least 1 slot time of back off. Intuition would suggest that as more nodes are added, the likelihood of immediate DCF access by at least one of the nodes should increase.

Comment 55 Tom Henderson 2016-09-28 00:15:23 EDT

the patch to bug 2369 (changeset 12345:a94d790ef6e5) changed the behavior so that backoff occurs if the request for access arrives during AIFS interval, but allows immediate access if the medium has been idle for time >= AIFS.  See discussion and documentation in bug 2369.