Bug 2067 - TCP performances drop when Advertised Window is larger than Sender Buffer size
TCP performances drop when Advertised Window is larger than Sender Buffer size
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: tcp
ns-3-dev
All All
: P5 critical
Assigned To: Tommaso Pecorella
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-17 17:04 EST by Tommaso Pecorella
Modified: 2015-11-11 23:43 EST (History)
3 users (show)

See Also:


Attachments
test program (11.94 KB, text/x-csrc)
2015-02-17 17:05 EST, Tommaso Pecorella
Details
patch (11.22 KB, patch)
2015-02-23 02:52 EST, Tommaso Pecorella
Details | Diff
mobility file needed for test program (86 bytes, text/plain)
2015-02-26 17:59 EST, Tom Henderson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tommaso Pecorella 2015-02-17 17:04:09 EST
The bug has been found in ns-3 users group by Krishna Deepak:
https://groups.google.com/forum/#!topic/ns-3-users/16paG01UErw

After some debugging, it seems that there are some unwanted effects:
1) the RTT calculation is imprecise (there is another bug opened).
2) the cWnd seems seems to grow abnormally after a certain point.
3) the throughput drops after a given point.

The "point" is when the advertised window becomes larger than the sender window size, but more investigations are needed.
The receiver window may be involved as well.
Comment 1 Tommaso Pecorella 2015-02-17 17:05:44 EST
Created attachment 1971 [details]
test program

Test program. Disabling the Window scaling hides the bug, as without windows scaling the default sender and receiver buffer size is larger than the maximum window size.
Comment 2 Tommaso Pecorella 2015-02-23 02:52:26 EST
Created attachment 1973 [details]
patch

I'm gonna call this effect "Idiotic Window Syndrome" (IWS), to make it clear that it's not the Silly Window Syndrome, but close to.
Comment 3 Tommaso Pecorella 2015-02-23 02:54:53 EST
Addendum: the DupAck processing functions don't have the micro-delay. I'm not sure that it's necessary there, but we could add it, just to be sure.
Comment 4 natale.patriciello 2015-02-26 04:06:58 EST
I tested the patch in my test environment, I effectively see an increase of transmitted byte. However, I am not sure about the effect on RTT, it seems a little higher.

I do not experienced issues when running ns-3-dev with this patch.
Comment 5 natale.patriciello 2015-02-26 05:44:58 EST
(In reply to natale.patriciello from comment #4)
> I tested the patch in my test environment, I effectively see an increase of
> transmitted byte. However, I am not sure about the effect on RTT, it seems a
> little higher.
> 
> I do not experienced issues when running ns-3-dev with this patch.

My thought on the RTT was true. Before:

http://i.imgur.com/dby4I6k.png

After:

http://i.imgur.com/lEvzJ1P.png


You can see more utilized queues, and slightly increase in RTT.
But, sincerely, I think the new behavior is more understandable than the previous one, because now we have packets of full segment size, while before our packets were small, in some cases.
Comment 6 Tom Henderson 2015-02-26 17:59:18 EST
Created attachment 1979 [details]
mobility file needed for test program
Comment 7 Tom Henderson 2015-03-14 16:01:12 EDT
I tested this a couple of weeks ago and I also observed improved behavior with this patch.  

I was thinking at the time that it would be nice to have a broken test case in the test suite that was fixed by this patch, but I didn't have time to write it since then.
Comment 8 Tom Henderson 2015-04-12 12:04:50 EDT
I have some questions/comments on the proposed patch.

Note, I have had some difficulty (so far) writing a simple test suite program that exposes this problem, although I can reproduce it with the wireless example provided (simplifying that program away from that wireless scenario seems to hide the behavior).  That is why I have been late to comment on this.  I may try another approach.

Meanwhile, some questions inline about the proposed patch.

> diff --git a/src/applications/model/bulk-send-application.cc b/src/applications/model/bulk-send-application.cc
> --- a/src/applications/model/bulk-send-application.cc
> +++ b/src/applications/model/bulk-send-application.cc
> @@ -224,7 +224,7 @@
>  
>    if (m_connected)
>      { // Only send new data if the connection has completed
> -      Simulator::ScheduleNow (&BulkSendApplication::SendData, this);
> +      SendData ();
>      }
>  }

Is the above linked to this bug or should it be a separate issue (separate changeset)?


> diff --git a/src/internet/model/tcp-newreno.cc b/src/internet/model/tcp-newreno.cc
> --- a/src/internet/model/tcp-newreno.cc
> +++ b/src/internet/model/tcp-newreno.cc
> @@ -182,7 +182,10 @@
>      { // Increase cwnd for every additional dupack (RFC2582, sec.3 bullet #3)
>        m_cWnd += m_segmentSize;
>        NS_LOG_INFO ("Dupack in fast recovery mode. Increase cwnd to " << m_cWnd);
> -      SendPendingData (m_connected);
> +      if (!m_sendPendingDataEvent.IsRunning ())
> +        {
> +          SendPendingData (m_connected);
> +        }
>      }

Is the above the gist of the fix (avoid too many calls to SendPendingData)?

> -  m_rWnd = tcpHeader.GetWindowSize ();
> -  m_rWnd <<= m_rcvScaleFactor;
> +  m_rWnd = (uint32_t(tcpHeader.GetWindowSize ()) << m_rcvScaleFactor);

This is a separate bug that I noticed also in bug 2058; do you agree to factor this m_rWnd setting bug out to a separate changeset/bug report?

> +              ScaleSsThresh (m_sndScaleFactor);


I am not sure about this change.  This seems to scale ssthresh according to the window scale factor.  However, this is advised in the RFC to set the initial ssthresh to the initial advertised window.  It seems to me that a more direct setting of initial ssthresh would be to just set it to the initial advertised window, rather than invoking scale factors.  That is, I think this may end up being not connected (and possibly exceeding) initial advertised window.
Comment 9 Tommaso Pecorella 2015-04-12 13:29:34 EDT
I did it some time ago, so I'm trying to remember.

(In reply to Tom Henderson from comment #8)
> > diff --git a/src/applications/model/bulk-send-application.cc b/src/applications/model/bulk-send-application.cc
> > --- a/src/applications/model/bulk-send-application.cc
> > +++ b/src/applications/model/bulk-send-application.cc
> > @@ -224,7 +224,7 @@
> >  
> >    if (m_connected)
> >      { // Only send new data if the connection has completed
> > -      Simulator::ScheduleNow (&BulkSendApplication::SendData, this);
> > +      SendData ();
> >      }
> >  }
> 
> Is the above linked to this bug or should it be a separate issue (separate
> changeset)?

It's this bug actually. All the bug is about not freeing some window, send what we have in the buffer, call the upper layer, find out we could send some more, send it.
In other terms, everything could be summarized as: do everything and then send what would can as the very last thing you do.
Invoking a new event defers the TxBuffer fillip, which is part of the bug.

> > diff --git a/src/internet/model/tcp-newreno.cc b/src/internet/model/tcp-newreno.cc
> > --- a/src/internet/model/tcp-newreno.cc
> > +++ b/src/internet/model/tcp-newreno.cc
> > @@ -182,7 +182,10 @@
> >      { // Increase cwnd for every additional dupack (RFC2582, sec.3 bullet #3)
> >        m_cWnd += m_segmentSize;
> >        NS_LOG_INFO ("Dupack in fast recovery mode. Increase cwnd to " << m_cWnd);
> > -      SendPendingData (m_connected);
> > +      if (!m_sendPendingDataEvent.IsRunning ())
> > +        {
> > +          SendPendingData (m_connected);
> > +        }
> >      }
> 
> Is the above the gist of the fix (avoid too many calls to SendPendingData)?

Indeed.
The goal is to never have SendPendingData called twice at the same time.

> > -  m_rWnd = tcpHeader.GetWindowSize ();
> > -  m_rWnd <<= m_rcvScaleFactor;
> > +  m_rWnd = (uint32_t(tcpHeader.GetWindowSize ()) << m_rcvScaleFactor);
> 
> This is a separate bug that I noticed also in bug 2058; do you agree to
> factor this m_rWnd setting bug out to a separate changeset/bug report?

Of course. It's "just" because I hate to have traces jumping around. TracedValues should be changed with a single operation, they're not "normal" variables.

> > +              ScaleSsThresh (m_sndScaleFactor);
> 
> I am not sure about this change.  This seems to scale ssthresh according to
> the window scale factor.  However, this is advised in the RFC to set the
> initial ssthresh to the initial advertised window.  It seems to me that a
> more direct setting of initial ssthresh would be to just set it to the
> initial advertised window, rather than invoking scale factors.  That is, I
> think this may end up being not connected (and possibly exceeding) initial
> advertised window.

I'm neutral about this. For sure the ssThresh must be set dynamically according to the advertised window. Right now it's static, so it's a bug. If we want to split it in another patch, I'm ok.
Comment 10 Tommaso Pecorella 2015-05-05 03:30:50 EDT
Pushed in changeset 11359 c2269e7c4f42