Bug 2649 - disabling TCP SACK does not fall back to NewReno behavior
disabling TCP SACK does not fall back to NewReno behavior
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: tcp
unspecified
All All
: P3 normal
Assigned To: natale.patriciello
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-02-05 21:08 EST by Tom Henderson
Modified: 2017-09-26 21:36 EDT (History)
1 user (show)

See Also:


Attachments
correct packet trace (13.66 KB, text/plain)
2017-09-25 19:02 EDT, Tom Henderson
Details
incorrect packet trace (14.25 KB, text/plain)
2017-09-25 19:02 EDT, Tom Henderson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2017-02-05 21:08:07 EST
I was debugging today the ns3-tcp-cwnd test failure (this is a Network Simulation Cradle test suite) since SACK was merged.

I disabled SACK for this test case in the hope that this was the issue, but still found different behavior.

The second test case of src/test/ns3tcp/ns3tcp-cwnd-test-suite.cc fails at this point:

  // At the point of loss, sndNxt = 15545; sndUna = 9113.  FlightSize is 4824,
  // so there are 9 segments outstanding.  Cut ssthresh to 9/2 (2412) and
  // cwnd to (9/2 + 3) = 4020
  event = m_responses.Get (10);
  NS_TEST_ASSERT_MSG_EQ (event.m_newCwnd, (MSS * 15)/2, "Wrong new cwnd value in cwnd change event " << 10);

The problem appears to be that EnterRecovery() method assumes SACK-based RFC 6675:

 (4.2) ssthresh = cwnd = (FlightSize / 2  
              The congestion window (cwnd) and slow start threshold
             (ssthresh) are reduced to half of FlightSize per [RFC5681].

However, 5681 and NewReno (6582) specify:

       The lost segment starting at SND.UNA MUST be retransmitted and
       cwnd set to ssthresh plus 3*SMSS.

The test assumes that the 3*SMSS is added, which is why it currently fails.

Is it a bug in the specification or the code?
Comment 1 natale.patriciello 2017-02-06 06:43:12 EST
Hi Tom,

Right now there is no need for inflation/deflation of the window. In fact, after each dupack, the implementation is freeing one segment from the amount of the in-flight bytes. Therefore, if LimitedTransmit is enabled, there is a space to send exactly one segment.

Please note that this has the same behavior as the inflating/deflating mechanism. After all, one segment should be sent out.
Comment 2 Tom Henderson 2017-02-07 10:05:40 EST
(In reply to natale.patriciello from comment #1)
> Hi Tom,
> 
> Right now there is no need for inflation/deflation of the window. In fact,
> after each dupack, the implementation is freeing one segment from the amount
> of the in-flight bytes. Therefore, if LimitedTransmit is enabled, there is a
> space to send exactly one segment.
> 
> Please note that this has the same behavior as the inflating/deflating
> mechanism. After all, one segment should be sent out.

It does not have the same behavior in the test I mentioned; fewer segments are sent off the top of the window.  NewReno sends the additional segments to avoid problems with multiple losses in a window, when SACK is not used.  Current behavior appears to be Reno in this case, not NewReno.

I think we need to make the non-SACK case behave like TCP NewReno since we are advertising it as such; RFC 6675 behavior applies to the SACK-enabled case.
Comment 3 natale.patriciello 2017-02-07 11:48:58 EST
Can I look to the traces? On my system is impossible to use NSC. thanks
Comment 4 natale.patriciello 2017-02-16 07:59:58 EST
Any news? I'm of changing the title of the bug since the New Reno specifications are about the partial ACK management and it seems correct from your explanation.
Comment 5 Tom Henderson 2017-03-12 00:52:11 EST
temporarily disabled ns3-tcp-cwnd until this is fixed, so as not to obscure other ns-3-dev issues
Comment 6 Tom Henderson 2017-09-25 19:02:05 EDT
Created attachment 2923 [details]
correct packet trace
Comment 7 Tom Henderson 2017-09-25 19:02:48 EDT
Created attachment 2924 [details]
incorrect packet trace
Comment 8 Tom Henderson 2017-09-26 21:28:57 EDT
For the archive, I'm attaching a correct tcpdump trace (obtained by 'tcpdump -r tcp-cwnd-ood-0-0.pcap -nn -tt' on the PCAP generated by the test) and incorrect trace.  The correct one is from ns-3.26; the incorrect from ns-3-dev @ 13073:0ad94b8b6fb4.

The main problem is that some code in CA_DISORDER and CA_RECOVERY that assumes that SACK is being used (or RFC 6675 loss recovery), and as a result, NewReno values for cWnd are not correctly set when SACK is disabled.  There is an attempt to work around this with TcpTxBuffer::CraftSackOption(), but it is not enough.

I chose to restore some ns-3.26 code for fast retransmit/recovery phases when m_sackEnabled is false.

In testing this, I found that an ns3-tcp-cwnd test vector had an unneeded change event upon leaving fast recovery:

2.93531s 0 Ns3CwndTest:CwndChange(): [DEBUG] Cwnd change event 26 at +2.93531195999999999996s 5092 1608
2.93531s 0 Ns3CwndTest:CwndChange(): [DEBUG] Cwnd change event 27 at +2.93531195999999999996s 1608 2144

so I regenerated the test vectors for that test to reduce the number from 40 to 39 (all other change events now line up).

The test 'ns3-tcp-loss.cc' can also be reverted to ns-3.26 response vectors and code, if Sack is configured to false.
Comment 9 Tom Henderson 2017-09-26 21:36:11 EDT
fixed in changeset 13099:373cf44ebc8f