Bug 1783 - Experiencing drops during fast recovery causes TCP's congestion window to blow up.
Experiencing drops during fast recovery causes TCP's congestion window to blo...
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: tcp
ns-3.18
Mac Intel Mac OS
: P5 critical
Assigned To: natale.patriciello
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-25 09:40 EDT by l.salameh
Modified: 2018-03-03 06:01 EST (History)
5 users (show)

See Also:


Attachments
Patch to fix tcp window problem (5.43 KB, application/octet-stream)
2013-10-25 09:40 EDT, l.salameh
Details
Updated patch (1.61 KB, patch)
2016-02-15 10:57 EST, Alexander Krotov
Details | Diff
Updated patch to account for retransmission (5.43 KB, patch)
2016-02-19 09:56 EST, natale.patriciello
Details | Diff
Test case (8.55 KB, patch)
2016-02-19 09:57 EST, natale.patriciello
Details | Diff
Subtraction made safe (2.60 KB, patch)
2016-02-22 06:46 EST, natale.patriciello
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description l.salameh 2013-10-25 09:40:33 EDT
Created attachment 1686 [details]
Patch to fix tcp window problem

The problem affects the tcp-newreno.cc implementation.

The sequence of events to trigger the bug are better explained with diagrams, and the steps are as follows:

1. A TCP flow experiences a series of losses, receives 3 duplicate acks, and enters fast recovery. It sets ssThresh to be half the cWnd.

   |<-------------XXXX--->|  cWnd
                   |
HeadSequence    losses   HighTxMark

2. Duplicate acks during fast recovery inflate cwnd. There are more losses during fast recovery.

   |<-------------XXXX--->|  original cWnd

         |<-------XXXX------XX----------------------X--------->| inflated cWnd
                            |                       |
         HeadSequence      Extra losses during fast recovery   HighTxMark




3. The flow recovers from the original losses, exits fast recovery. At this point cWnd is restored to half its value before the losses. We immediately gets 3 duplicate acks and enter fast recovery again. Unfortunately, the flightSize() calculation which determines the ssThresh and hence the new cWnd uses HighTxMark - HeadSequence shown below:


                          |<XX----------------------X--------->| 
                            
                       HeadSequence                        HighTxMark

I.e. cWnd is now set to a bogus high value, not in fact a quarter of its original value before the 1st loss.

4. Even if we try to limit cWnd to the correct value in step 3, on receiving a new ack during this second fast recovery stage, and attempting to deflate the window, the calculation seq - HeadSequence might be bigger than cWnd. Deflating cWnd by a larger amount will cause the unsigned variable to wrap around.

See attached for bug fix patch, which was reworked from ns-2's handling of this situation.
Comment 1 Alexander Krotov 2016-02-15 10:57:58 EST
Created attachment 2274 [details]
Updated patch

Updated patch to apply on top of current ns-3-dev.
Comment 2 natale.patriciello 2016-02-15 12:05:51 EST
 TcpSocketBase::BytesInFlight () const
 {
   NS_LOG_FUNCTION (this);
-  return m_highTxMark.Get () - m_txBuffer->HeadSequence ();
+  uint32_t flightSize = m_nextTxSequence.Get () - m_txBuffer->HeadSequence ();
+  uint32_t duplicatesSize = m_dupAckCount * m_tcb->m_segmentSize;
+  return duplicatesSize > flightSize ? 0 : flightSize - duplicatesSize;
 }


This is not correct. In my opinion, every attempts which involves m_dupAckCount variable is not going to work, because this variable contains also out-of-order retransmissions, and in general every mistake we made on guessing the loss of a segment. Then, it accounts for a static dimension of a segment (which is not true in every case).

My point is (and it's stated in another, similar, bug) that we need to associate informations into a list of sent blocks. Then, based on the received ACK (or SACK informations) we should made this calculation. Currently there is some proof-of-concept code, but nobody has the time to actively working on it.
Comment 3 Alexander Krotov 2016-02-16 03:30:35 EST
If you download the patch, it contains the link to https://tools.ietf.org/html/rfc4898#page-23, which suggests this equation for the case when no SACK is implemented:

PipeSize=SND.NXT-SND.UNA+(retransmits-dupacks)*CurMSS

What attached patch implements is:

PipeSize=SND.NXT-SND.UNA-dupacks*CurMSS

You are right that we should account for retransmissions, but there is only one retransmission at a time until SACK is implemented, so the error is 1 MSS at most, unlike the current implementation, under which the error is about the congestion window up to some constant factor. At least the patch fixes the bug (ssthreshold blows up in case of RTO during fast recovery). More accurate estimation, taking into account retransmissions and MSS changes can be left for bug #2256.
Comment 4 natale.patriciello 2016-02-19 09:56:20 EST
Created attachment 2281 [details]
Updated patch to account for retransmission

I added the support for retransmission on your patch; please check it.
Comment 5 natale.patriciello 2016-02-19 09:57:42 EST
Created attachment 2282 [details]
Test case

Here the test case.

To test it, please download the branch tcp-next from github.com/kronat/ns-3-dev-git since it depends on other patches (here the history):

Fri Feb 19 15:53:35 2016 +0100 6b09122 (HEAD -> tcp-next, mirror/tcp-next) internet: added tcp-bytes-in-flight test  [Natale Patriciello]
Fri Feb 19 15:53:12 2016 +0100 b8e5fd8 internet: TcpErrorModel Drop callback extended  [Natale Patriciello]
Fri Feb 19 15:52:21 2016 +0100 8def793 internet: (fixes #1783) BytesInFlight conforms to RFC 4898  [Natale Patriciello]
Fri Feb 19 15:45:48 2016 +0100 736afe5 internet: added BytesInFlight trace inside TcpSocketBase  [Natale Patriciello]
Fri Feb 19 15:41:07 2016 +0100 fd7834c internet: tracing of BytesInFlight inside TcpGeneralTest  [Natale Patriciello]
Fri Feb 19 11:56:46 2016 +0100 d51510a internet: tcp tx trace should not include header  [Natale Patriciello]
Comment 6 l.salameh 2016-02-19 19:16:12 EST
Thanks Natale. With your fixes, BytesInFlight() should return a reasonable value.

2 things to point out about the patch:

1. It would be nice if the tcp test case also verified cwnd to be the correct value, in addition to bytes in flight.

2. There is still one issue to be resolved, as described by point #4 in my original bug post. The second time we enter fast recovery, the cwnd has been deflated, and yet we still have this calculation:

   uint32_t bytesAcked = ackNumber - m_txBuffer->HeadSequence ();

and if we get a partial ack :
 
  if (segsAcked >= 1)
  {
     m_tcb->m_cWnd += m_tcb->m_segmentSize - bytesAcked;
  }
  else
 {
   m_tcb->m_cWnd -= bytesAcked;
 }

Calling cwnd = cwnd - bytesAcked may cause cwnd to wrap around if bytesAcked is larger than cwnd. I had run into this particular case 2 years ago when I first filed this bug.

If you check my original patch, I had code to fix this which kept track of the inflation size (I had misleadingly called the variable m_duplicatesSize) and tried to cap the cwnd recovery at that:
 
+      //Deflate cWnd by the amount acknowledged, but don't let that exceed inflation size.
+      if(bytesAcked > m_inflationSize){
+        bytesAcked = m_inflationSize;
+      }
+      m_inflationSize -= bytesAcked;
+      
+      NS_ASSERT(bytesAcked <= m_cWnd); //Don't let the cWnd wrap-around

Check the original patch for details.
Comment 7 l.salameh 2016-02-19 19:56:18 EST
Ooops, with the new code we shouldn't have to add a new variable, and the fix should be approximately:


+      uint32_t inflationSize = (m_dupAckCount - m_retransOut)*m_tcb->m_segmentSize * m_segmentSize;
+      if(bytesAcked > inflationSize){
+        bytesAcked = inflationSize;
+      }
+      NS_ASSERT(bytesAcked <= m_tcb->m_cWnd); //Don't let the cWnd wrap-around


The same logic should be applied as BytesInFlight() calculation to figure out how much inflation we have.

It would be great if we can make the tcp-bytes-in-flight-test actually test for 2 fast recovery scenarios back to back (which was triggering the original bug).

I.e.:
1. multiple losses per window.
2. fast network causing lots of dup acks and large inflation in first fast recovery.
3. trigger more drops in the window even during fast recovery.

Basically, it would be cool to test the exact scenario shown in diagrams for Comment#0.
Comment 8 natale.patriciello 2016-02-22 06:11:08 EST
The original point is:

4. Even if we try to limit cWnd to the correct value in step 3, on receiving a new ack during this second fast recovery stage, and attempting to deflate the window, the calculation seq - HeadSequence might be bigger than cWnd. Deflating cWnd by a larger amount will cause the unsigned variable to wrap around.

For sure we need more assert about such substraction, and I'll add these as soon as possible; but for what regards the test case, I'm in truble finding the right values for ACK and for dropping packets; I never experienced this overflow.
Comment 9 natale.patriciello 2016-02-22 06:25:02 EST
(In reply to natale.patriciello from comment #8) 
> For sure we need more assert about such substraction, and I'll add these as
> soon as possible; but for what regards the test case, I'm in truble finding
> the right values for ACK and for dropping packets; I never experienced this
> overflow.

And probably this is due to the fact that, with the new BytesInFlight, we never have more than cWnd in flight (and it's exactly the point of the bug, in first instance).
Comment 10 natale.patriciello 2016-02-22 06:46:51 EST
Created attachment 2289 [details]
Subtraction made safe

Here's the patch I'd like to commit on top of the others; it should prevents any wrap-around.
Comment 11 natale.patriciello 2016-02-22 06:48:30 EST
(In reply to natale.patriciello from comment #10)
> Created attachment 2289 [details]
> Subtraction made safe
> 
> Here's the patch I'd like to commit on top of the others; it should prevents
> any wrap-around.

Uh, I forgot to ask your feedback as soon as possible :) thanks!
Comment 12 l.salameh 2016-02-22 08:17:27 EST
The BytesInFlight is indeed fixed correctly, which means the cWnd is the correct, deflated size after we exit the first fast recovery. But we still have the calculation:

uint32_t bytesAcked = ackNumber - m_txBuffer->HeadSequence ();

Which, given an inflated window, may be larger than cWnd, causing the wrap around.

Although the SafeSubtraction fix prevents wrap around and the window blowing up, I think we still want to deflate cWnd, (i.e. subtract more than 0), by the total amount of inflation incurred during this fast recovery phase.

Just wondering whether anyone else has any thoughts on this and whether it is correct.
Comment 13 natale.patriciello 2016-02-22 09:14:17 EST
(In reply to l.salameh from comment #12)
> The BytesInFlight is indeed fixed correctly, which means the cWnd is the
> correct, deflated size after we exit the first fast recovery. But we still
> have the calculation:
> 
> uint32_t bytesAcked = ackNumber - m_txBuffer->HeadSequence ();
> 
> Which, given an inflated window, may be larger than cWnd, causing the wrap
> around.

We are in the case ackNumber < m_recover : I have difficulties to imagine a scenario where we are in fast recovery and we receive a partial ACK for more bytes than the cWnd.

> Although the SafeSubtraction fix prevents wrap around and the window blowing
> up, I think we still want to deflate cWnd, (i.e. subtract more than 0), by
> the total amount of inflation incurred during this fast recovery phase.

In fact SafeSubtraction does the deflating. And we are deflating correctly now..
Anyway, for the future I'd like to remove inflation/deflation, following the Linux example.

> Just wondering whether anyone else has any thoughts on this and whether it
> is correct.

Me too...
Comment 14 natale.patriciello 2016-02-23 04:18:59 EST
Initial fix in 11906:d8807ab22438

Test case in 11908:24938385e374

Leave open for more checks in the test
Comment 15 Tom Henderson 2016-02-24 02:00:06 EST
marking patch wanted for more checks in test
Comment 16 natale.patriciello 2018-03-03 06:01:17 EST
Latest fixes in  13372:950f852a89f9