Bug 2649 – disabling TCP SACK does not fall back to NewReno behavior

Bug 2649 - disabling TCP SACK does not fall back to NewReno behavior


Summary:	disabling TCP SACK does not fall back to NewReno behavior

Status:	RESOLVED FIXED

Product:	ns-3
Classification:	Unclassified
Component:	tcp
Version:	unspecified
Hardware:	All All

Importance:	P3 normal
Assigned To:	natale.patriciello

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2017-02-05 21:08 EST by Tom Henderson
Modified:	2017-09-26 21:36 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
correct packet trace (13.66 KB, text/plain) 2017-09-25 19:02 EDT, Tom Henderson	Details
incorrect packet trace (14.25 KB, text/plain) 2017-09-25 19:02 EDT, Tom Henderson	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Henderson 2017-02-05 21:08:07 EST

I was debugging today the ns3-tcp-cwnd test failure (this is a Network Simulation Cradle test suite) since SACK was merged.

I disabled SACK for this test case in the hope that this was the issue, but still found different behavior.

The second test case of src/test/ns3tcp/ns3tcp-cwnd-test-suite.cc fails at this point:

  // At the point of loss, sndNxt = 15545; sndUna = 9113.  FlightSize is 4824,
  // so there are 9 segments outstanding.  Cut ssthresh to 9/2 (2412) and
  // cwnd to (9/2 + 3) = 4020
  event = m_responses.Get (10);
  NS_TEST_ASSERT_MSG_EQ (event.m_newCwnd, (MSS * 15)/2, "Wrong new cwnd value in cwnd change event " << 10);

The problem appears to be that EnterRecovery() method assumes SACK-based RFC 6675:

 (4.2) ssthresh = cwnd = (FlightSize / 2  
              The congestion window (cwnd) and slow start threshold
             (ssthresh) are reduced to half of FlightSize per [RFC5681].

However, 5681 and NewReno (6582) specify:

       The lost segment starting at SND.UNA MUST be retransmitted and
       cwnd set to ssthresh plus 3*SMSS.

The test assumes that the 3*SMSS is added, which is why it currently fails.

Is it a bug in the specification or the code?

Comment 1 natale.patriciello 2017-02-06 06:43:12 EST

Hi Tom,

Right now there is no need for inflation/deflation of the window. In fact, after each dupack, the implementation is freeing one segment from the amount of the in-flight bytes. Therefore, if LimitedTransmit is enabled, there is a space to send exactly one segment.

Please note that this has the same behavior as the inflating/deflating mechanism. After all, one segment should be sent out.

Comment 2 Tom Henderson 2017-02-07 10:05:40 EST

(In reply to natale.patriciello from comment #1)
> Hi Tom,
> 
> Right now there is no need for inflation/deflation of the window. In fact,
> after each dupack, the implementation is freeing one segment from the amount
> of the in-flight bytes. Therefore, if LimitedTransmit is enabled, there is a
> space to send exactly one segment.
> 
> Please note that this has the same behavior as the inflating/deflating
> mechanism. After all, one segment should be sent out.

It does not have the same behavior in the test I mentioned; fewer segments are sent off the top of the window.  NewReno sends the additional segments to avoid problems with multiple losses in a window, when SACK is not used.  Current behavior appears to be Reno in this case, not NewReno.

I think we need to make the non-SACK case behave like TCP NewReno since we are advertising it as such; RFC 6675 behavior applies to the SACK-enabled case.

Comment 3 natale.patriciello 2017-02-07 11:48:58 EST

Can I look to the traces? On my system is impossible to use NSC. thanks

Comment 4 natale.patriciello 2017-02-16 07:59:58 EST

Any news? I'm of changing the title of the bug since the New Reno specifications are about the partial ACK management and it seems correct from your explanation.

Comment 5 Tom Henderson 2017-03-12 00:52:11 EST

temporarily disabled ns3-tcp-cwnd until this is fixed, so as not to obscure other ns-3-dev issues

Comment 6 Tom Henderson 2017-09-25 19:02:05 EDT

Created attachment 2923 [details]
correct packet trace

Comment 7 Tom Henderson 2017-09-25 19:02:48 EDT

Created attachment 2924 [details]
incorrect packet trace

Comment 8 Tom Henderson 2017-09-26 21:28:57 EDT

For the archive, I'm attaching a correct tcpdump trace (obtained by 'tcpdump -r tcp-cwnd-ood-0-0.pcap -nn -tt' on the PCAP generated by the test) and incorrect trace.  The correct one is from ns-3.26; the incorrect from ns-3-dev @ 13073:0ad94b8b6fb4.

The main problem is that some code in CA_DISORDER and CA_RECOVERY that assumes that SACK is being used (or RFC 6675 loss recovery), and as a result, NewReno values for cWnd are not correctly set when SACK is disabled.  There is an attempt to work around this with TcpTxBuffer::CraftSackOption(), but it is not enough.

I chose to restore some ns-3.26 code for fast retransmit/recovery phases when m_sackEnabled is false.

In testing this, I found that an ns3-tcp-cwnd test vector had an unneeded change event upon leaving fast recovery:

2.93531s 0 Ns3CwndTest:CwndChange(): [DEBUG] Cwnd change event 26 at +2.93531195999999999996s 5092 1608
2.93531s 0 Ns3CwndTest:CwndChange(): [DEBUG] Cwnd change event 27 at +2.93531195999999999996s 1608 2144

so I regenerated the test vectors for that test to reduce the number from 40 to 39 (all other change events now line up).

The test 'ns3-tcp-loss.cc' can also be reverted to ns-3.26 response vectors and code, if Sack is configured to false.

Comment 9 Tom Henderson 2017-09-26 21:36:11 EDT

fixed in changeset 13099:373cf44ebc8f