Bug 2470 - handshake to setup the Block Ack Agreement is not protected
handshake to setup the Block Ack Agreement is not protected
Status: NEEDINFO
Product: ns-3
Classification: Unclassified
Component: wifi
ns-3.27
All All
: P5 normal
Assigned To: sebastien.deronne
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-07 14:54 EDT by Tom Henderson
Modified: 2017-11-03 05:21 EDT (History)
2 users (show)

See Also:


Attachments
proposed solution (7.33 KB, patch)
2016-12-14 12:49 EST, sebastien.deronne
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2016-08-07 14:54:49 EDT
When conditions exist to trigger a ADDBA Request/Response, packets to that remote address and TID are added to a list of QosBlockedDestinations (blocked until the handshake completes).

However, it may be the case that either the Request or Response exceed the retry limit, and is lost.  There is no mechanism to restart this handshake, and even subsequent frames that arrive will not trigger a new handshake attempt.  The destination/tid becomes blocked for the rest of the simulation.

In my opinion, some kind of a timer protecting this is needed, such that if a timeout expires without successful setup, the station either retries the handshake or unblocks the address/tid so that a subsequent AccessGranted event could trigger a new handshake.  Also, I recommend a test program that explicitly forces a loss of either the Request or Response and checks that the proper state is restored on both endpoints.
Comment 1 sebastien.deronne 2016-08-13 06:08:05 EDT
I agree we should have a kind of protection here.
Is there anything mentioned in the std about that?
Comment 2 sebastien.deronne 2016-11-01 11:38:46 EDT
We should unblock this thread.
Any one knows whether some kind of protection is implemented in the field?
Comment 3 sebastien.deronne 2016-11-23 14:26:42 EST
Tom, do you have a scenario to reproduce?
Comment 4 sebastien.deronne 2016-11-23 16:43:05 EST
(In reply to sebastien.deronne from comment #3)
> Tom, do you have a scenario to reproduce?

I could reproduce with some quick hack in the code.
Comment 5 sebastien.deronne 2016-11-23 16:53:21 EST
This might help:

http://linux-wireless.vger.kernel.narkive.com/izrVkBAO/patch-mac80211-fix-the-way-addba-request-count-being-modified

My view is that we should add a counter (parameter) of the maximum number of allowed ADDBA attempts (and 0 meaning infinite) iso using maxSlrc. When this maximum number of attempts is reached, A-MPDU should be disabled for that TID. Also, AP that does not support Block Ack should not reply to ADDBA, but this is maybe beyond the scope of this issue.

Any other view or complement on this?
Comment 6 Ioannis 2016-11-26 15:55:39 EST
I found the following

- IEEE-802.11-2012std (10.5.2.2 Procedure at the originator):

"If there is no response from the recipient within dot11ADDBAFailureTimeout, the STA has not established a Block Ack mechanism with the recipient STA; and the MLME shall issue an MLME-ADDBA.confirm primitive with a result code of TIMEOUT."

- Next Generation Wireless LANs: 802.11n and 802.11ac (2nd edition), E. Perahia, R. Stacey:

"The block ack session is initiated by the originator sending an ADDBA Request frame. In response to a correctly received ADDBA Request frame, the responder sends an ACK. After further processing the responder sends an ADDBA Response frame to which the originator responds with an ACK if correctly received. The originator and responder will retransmit ADDBA Request/Response if the expected ACK is not received. An inactivity timeout at the originator will detect a failed session setup." 

Regarding the SLRC (long retry counter) or SSRC (short retry counter) there are two sections in IEEE-802.11-2012std;
 
- 9.3.3 Random backoff time:
The SSRC shall be incremented when any short retry count (SRC) associated with any MPDU of type Data is incremented. The SLRC shall be incremented when any long retry count (LRC) associated with any MPDU of type Data is incremented. 

- 9.19.2.6 Retransmit procedures:
QoS STAs shall maintain a short retry counter and a long retry counter for each MSDU, A-MSDU, or MMPDU that belongs to a TC requiring acknowledgment. The initial value for the short and long retry counters shall be 0. QoS STAs also maintain a short retry counter and a long retry counter for each AC. They
are defined as QSRC[AC] and QLRC[AC], respectively, and each is initialized to a value of 0. After transmitting a frame that requires an immediate acknowledgment, the STA shall perform either of the acknowledgment procedures, as appropriate, that are defined in 9.3.2.8 and 9.21.3. The short retry count for an MSDU or A-MSDU that is not part of a Block Ack agreement or for an MMPDU shall be incremented every time transmission of a frame of length less than or equal to dot11RTSThreshold fails for that MSDU, A-MSDU, or MMPDU. QSRC[AC] shall be incremented every time transmission of an A-MPDU or frame of length less than or equal to dot11RTSThreshold fails. This short retry count and the QoS STA QSRC[AC] shall be reset when an A-MPDU or frame of length less than or equal to dot11RTSThreshold succeeds. The long retry count for an MSDU or A-MSDU that is not part of a Block Ack agreement or for an MMPDU shall be incremented every time transmission of a MAC frame of length greater than dot11RTSThreshold fails for that MSDU, A-MSDU, or MMPDU. QLRC[AC] shall be incremented every time transmission of an A-MPDU or frame of length greater than or equal to dot11RTSThreshold fails. This long retry count and the QLRC[AC] shall be reset when an A-MPDU or frame of length greater than dot11RTSThreshold succeeds. ...
Retries for failed transmission attempts shall continue until the short retry count for the MSDU, A-MSDU, or MMPDU is equal to dot11ShortRetryLimit or until the long retry count for the MSDU, A-MSDU, or MMPDU is equal to dot11LongRetryLimit. When either of these limits is reached, retry attempts shall cease, and the MSDU, A-MSDU, or MMPDU shall be discarded.


So, I think that indeed a timer and Ssrc are needed for these frames.
Comment 7 sebastien.deronne 2016-12-09 14:32:21 EST
What would happen if the timeout has elapsed? Does it continuously retry to establish the session? Also, I do not get your point wrt ssrc, could you please explain?
Comment 8 sebastien.deronne 2016-12-14 12:49:57 EST
Created attachment 2701 [details]
proposed solution

I made a patch with a proposed solution for this bug.

I am not sure there is a strict rule in the standard for this, but they do mention an addbafailuretimeout. I thus added a timeout in EdcaTxopN: once elapsed and no addba response was received yet, it restarts the addba request procedure.
Comment 9 Ioannis 2016-12-17 10:23:15 EST
(In reply to sebastien.deronne from comment #7)
> What would happen if the timeout has elapsed? Does it continuously retry to
> establish the session? 

I think that during that session, if the originator(recipient) doesn't receive an ack to establish the session, they retransmit the missed frame up to ssrc or dot11ADDBAFailureTimeout. So, if this timeout elapses the BlockAck is not established and the frames are transmitted as normal PPDUs. I'm not sure if after some time the nodes retry to establish the session (i.e. after BlockAckTimeout, since if BlockAckTimeout occurs the Block Ack is considered to be torn down (6.3.29.2.2 Semantics of the service primitive))


I haven't gone through the patch yet, but I think there is a typo in the attribute
"AddBaFailureTimeout",
Specifies a time limit (in multiple of 104 microseconds) should be 
Specifies a time limit (in multiple of 1024 microseconds)

and that the minimum value should be 1 (according to 6.3.29.2.2 Semantics of the service primitive).
Comment 10 sebastien.deronne 2016-12-17 10:27:05 EST
(In reply to Ioannis from comment #9)
> (In reply to sebastien.deronne from comment #7)
> > What would happen if the timeout has elapsed? Does it continuously retry to
> > establish the session? 
> 
> I think that during that session, if the originator(recipient) doesn't
> receive an ack to establish the session, they retransmit the missed frame up
> to ssrc or dot11ADDBAFailureTimeout. So, if this timeout elapses the
> BlockAck is not established and the frames are transmitted as normal PPDUs.
> I'm not sure if after some time the nodes retry to establish the session
> (i.e. after BlockAckTimeout, since if BlockAckTimeout occurs the Block Ack
> is considered to be torn down (6.3.29.2.2 Semantics of the service
> primitive))
> 

I understand the opposite: if the timeout elapsed, it starts a new BA session establishment, otherwise in a scenario with a lot of stations trying to establish a BA session we have a high chance to not be able to use Block ACKs, which is not really acceptable in my opinion.

> 
> I haven't gone through the patch yet, but I think there is a typo in the
> attribute
> "AddBaFailureTimeout",
> Specifies a time limit (in multiple of 104 microseconds) should be 
> Specifies a time limit (in multiple of 1024 microseconds)
> 
> and that the minimum value should be 1 (according to 6.3.29.2.2 Semantics of
> the service primitive).
Comment 11 Ioannis 2016-12-17 10:47:52 EST
(In reply to sebastien.deronne from comment #10)
> I understand the opposite: if the timeout elapsed, it starts a new BA
> session establishment, otherwise in a scenario with a lot of stations trying
> to establish a BA session we have a high chance to not be able to use Block
> ACKs, which is not really acceptable in my opinion.

You might be right but 
In a dense deployment under full buffer conditions, most of the nodes won't be able to establish the session. What if the nodes continuously retry to establish a BA session and end up in a situation where most of the frames flying around are management frames.
Comment 12 Tom Henderson 2016-12-20 01:35:28 EST
> I haven't gone through the patch yet, but I think there is a typo in the
> attribute
> "AddBaFailureTimeout",
> Specifies a time limit (in multiple of 104 microseconds) should be 
> Specifies a time limit (in multiple of 1024 microseconds)
> 
> and that the minimum value should be 1 (according to 6.3.29.2.2 Semantics of
> the service primitive).

I tend to agree on this point; why should we provide an option (special value of zero) to disable this behavior?
Comment 13 Tom Henderson 2016-12-20 01:42:55 EST
Regarding the rest of the patch and the patch on what to do upon a timeout, I might lean towards Ioannis's viewpoint that if the conditions do not support setting up agreement, that it may be more harmful to just try incessantly without trying instead normal PPDUs (at least for a while).  But I suspect we won't get clear guidance on this detail and we are just guessing.  I wonder if conditions that lead to such failure will also lead to disassociation, allowing reassociation and perhaps reset/retry at that point?

Also, I'd prefer to add a regression test case where these frames are forcibly dropped to check the behavior.
Comment 14 sebastien.deronne 2016-12-21 15:01:51 EST
(In reply to Tom Henderson from comment #12)
> > I haven't gone through the patch yet, but I think there is a typo in the
> > attribute
> > "AddBaFailureTimeout",
> > Specifies a time limit (in multiple of 104 microseconds) should be 
> > Specifies a time limit (in multiple of 1024 microseconds)
> > 
> > and that the minimum value should be 1 (according to 6.3.29.2.2 Semantics of
> > the service primitive).
> 
> I tend to agree on this point; why should we provide an option (special
> value of zero) to disable this behavior?

correct, standard says valid range starts from 1
Comment 15 sebastien.deronne 2016-12-21 15:04:16 EST
(In reply to Tom Henderson from comment #13)
> Regarding the rest of the patch and the patch on what to do upon a timeout,
> I might lean towards Ioannis's viewpoint that if the conditions do not
> support setting up agreement, that it may be more harmful to just try
> incessantly without trying instead normal PPDUs (at least for a while).  But
> I suspect we won't get clear guidance on this detail and we are just
> guessing.  I wonder if conditions that lead to such failure will also lead
> to disassociation, allowing reassociation and perhaps reset/retry at that
> point?
> 
> Also, I'd prefer to add a regression test case where these frames are
> forcibly dropped to check the behavior.

I indeed think best would be to restart association, to avoid blocking scenario. 

We are guessing what would be the most appropriate solution, standard does not mention anything and I have actually no idea what is implemented in practice.

I agree we should add a regression test case.
Comment 16 sebastien.deronne 2017-04-21 06:06:04 EDT
Still pending action: investigate how this work in practice.