Bug 2493 - EDCA entering backoff while in SIFS
EDCA entering backoff while in SIFS
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: wifi
unspecified
All All
: P5 normal
Assigned To: Tom Henderson
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-09-08 00:14 EDT by Tom Henderson
Modified: 2016-09-28 00:22 EDT (History)
1 user (show)

See Also:


Attachments
patch to fix (7.44 KB, patch)
2016-09-15 16:20 EDT, Tom Henderson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2016-09-08 00:14:29 EDT
I was alerted to a bug that is triggered by using EDCA with flows in multiple access categories, although the test case might be reducible to a single access category.

The problem occurs when the sender is within a TXOP and is waiting in the SIFS period between the receipt of the ACK and the start of the next packet transmission.  If a packet is enqueued to that EdcaTxopN during this interval, and the TXOP expires shortly afterwards, an assert eventually triggers after the next ACK is received.

time 1 ---------QoS data frame ----->

time 2 <--------Ack-----------------

---> time 3 new packet enqueued

time 4 ---------QoS data frame------->

(... TXOP expires)

time 5 <--------Ack -----------------
** assert fires here upon trying to re-enter backoff

time 4 is (time 2 + SIFS) and there is still time remaining in the TXOP to send the frame at time 4, but this is the last frame in the TXOP, so when time 5 occurs, EdcaTxopN::GotAck() will call StartBackoffNow ().

StartBackoffNow () will assert if the backoff count is not already zero.  In this case, the frame that arrived at time 3 causes EdcaTxopN::Queue () to be called, which calls StartAccessIfNeeded (), and since there isn't an outstanding packet while in SIFS, that method will call DcfManager::RequestAccess() which creates a new backoff.  

What seems to be needed is a way to detect at the EdcaTxopN whether the station is in SIFS, and if so, when enqueuing, to suppress the call to StartAccessIfNeeded () (or reschedule it to after SIFS).

I tried an approach that fixed it for Wi-Fi but ended up breaking the mesh and wave cases, so am rethinking another approach.
Comment 1 Tom Henderson 2016-09-15 16:07:00 EDT
I have to correct the above timeline slightly.

time 1 ---------QoS data frame ----->

time 2 <--------Ack-----------------

---> time 3 new packet enqueued

time 4 ---------QoS data frame------->

(... TXOP expires)

time 5 <--------Ack -----------------
** assert fires here upon trying to re-enter backoff


The problem here is actually at time 2.  There is backoff time remaining, but the code in EdcaTxopN::HasTxop () returns false because there is no pending QoS frame.  As a result, a new backoff starts at time 2.

Meanwhile, the SIFS period is between times 2 and 4, and during this period, a new QoS frame is enqueued.  When time 4 occurs (triggered by the call to EdcaTxopN::StartNext (), there is now a packet to be sent, so it is transmitted despite the backoff timer now being (incorrectly) non-zero. Later, at time 5, upon another Ack, HasTxop () again returns false, and the code calls to StartBackoffNow(), and the assert testing that m_backoffSlots == 0 is hit.
Comment 2 Tom Henderson 2016-09-15 16:18:39 EDT
> 
> The problem here is actually at time 2.  There is backoff time remaining,
                                                     ^^^^
                                                     TXOP
> but the code in EdcaTxopN::HasTxop () returns false because there is no
> pending QoS frame.  As a result, a new backoff starts at time 2.

sorry, another small correction to the above (it is TXOP time that is remaining, not backoff time).

I have taken the approach that the right way to handle this is to suppress the backoff that occurs in GotAck when it is in the middle of a TXOP, to allow for the possibility that a new QoS frame arrives during the SIFS interval (which is the case I started to look into).  If such frame does not materialize during SIFS interval, what will happen is that StartNext () will return without sending (and a future Enqueue () can start access procedures again).  If StartNext () finds that it has a pending frame but no TXOP time left, then it can start backoff again (basically, moving this case over from GotAck() to StartNext()).

I had a look at the standard for when to enter backoff.

"The backoff procedure shall be invoked for an EDCAF when any of the following events occurs:
a) A frame with that AC is requested to be transmitted, the medium is busy as indicated by either
physical or virtual CS, and the backoff timer has a value of 0 for that AC.
b) The final transmission by the TXOP holder initiated during the TXOP for that AC was successful
and the TXNAV timer has expired.
c) The transmission of the initial frame of a TXOP of that AC fails.
d) An internal collision is reported for that EDCAF (see 9.19.2.3).

In addition, the backoff procedure may be invoked for an EDCAF when the transmission of a non-initial frame
by the TXOP holder fails."

We pretty much do the above, except for case a), in which we are not performing the checks that the medium is busy and the backoff timer is zero.  To perform these checks requires to expose a couple of methods that are currently private in dcf-manager.h (namely, IsBusy () and GetBackoffSlots ()).  So my patch makes these public and implements this check.

The specification language also suggests to me that it may be valid to re-enter backoff for cases b-d even if backoff timer has a non-zero value.  However, I left the assert in DcfManager::StartBackoffNow () for now, and decided to wrap calls to this with a new EdcaTxopN() helper method that suppresses calls to start backoff again when it is running.
Comment 3 Tom Henderson 2016-09-15 16:20:39 EDT
Created attachment 2581 [details]
patch to fix

All tests pass except mesh PCAP-based tests which need to be regenerated.
Comment 4 Tom Henderson 2016-09-16 22:16:32 EDT
(In reply to Tom Henderson from comment #3)
> Created attachment 2581 [details]
> patch to fix
> 
> All tests pass except mesh PCAP-based tests which need to be regenerated.

The mesh tests are failing due to more than timing differences.  It seems that the DCF/EDCA immediate access scenarios (broadcast collisions) are exacerbated by this (bug 2369).  Need to look again at 2369...
Comment 5 Tom Henderson 2016-09-28 00:22:38 EDT
It appears that the patch to 2369, which forces a backoff when the request to DcfManager occurs during AIFS, resolves this problem.  That patch was committed as changeset 12345:a94d790ef6e5.