Bug 2278

Summary: In TcpSocketBase, need to advance NextTxSequence before calling NotifySend().
Product: ns-3 Reporter: l.salameh
Component: tcpAssignee: natale.patriciello
Status: PATCH WANTED ---    
Severity: normal CC: l.salameh, ns-bugs, tomh
Priority: P5    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: Tcp close test
Modified tcp close test
Modified tcp close test and general test
Updated test

Description l.salameh 2016-01-23 20:12:26 EST
I am running experiments where I am modeling web page downloads, where a web page is made up of multiple small resources. Since I'm modeling HTTP 1.0, I have a scenario where each download occurs on a single TCP connection, which the server closes once it perceives that the download is done. 

The way the server terminates connections, is as follows:

The server registers a callback using the socket->SetSendCallback(), which triggers when bytes are available in the TCP send buffer.

In this callback, if we have sent all the necessary data, the callback calls socket->close().

Now the problem I'm facing happens when I have a short file downloads and losses, where the download requires 2 1500 byte packets. Also, in order to encounter this bug, I had set my initial cwnd to 4 segments.

In my test, the server tries to send the two packets making up the file at once, and due to losses, the client never receives the very first packet. After a timeout the 1st packet is retransmitted, and all is well.

But, when I receive the socket send data callback, and attempt to close the connection, TcpSocketBase's close method does the following check

if (m_txBuffer->SizeFromSequence (m_nextTxSequence) > 0)
{
}

To check whether there is any pending data left to be sent before we can finally close the connection.

Unfortunately, the method in TcpSocketBase which triggers the send callback is NewAck(), which modifies the m_nextTxSequence parameter AFTER we call NotifySend(). This means that due to the retransmit m_nextTxSequence is smaller than the maximum sequence number, and the connection stalls and never closes because it thinks that there is still data to send.

The solution is to update m_nextTxSequence BEFORE we call NotifySend(). 

I.e. change the code to the following in TcpSocketBase::NewAck() :

m_txBuffer->DiscardUpTo (ack);
  
if (ack > m_nextTxSequence)
{
  m_nextTxSequence = ack; // If advanced
}
if (GetTxAvailable () > 0)
{
  NotifySend (GetTxAvailable ());
}


If needed, I can try to write a test script that triggers this bug, outside of the large scenario that I'm running myself.
Comment 1 natale.patriciello 2016-01-25 08:08:40 EST
Uhm Uhm Uhm.

This is not directly related to the TCP socket, but instead how the application is handling the connection.

The connection should be closed only when all the data is transmitted into Tcp buffer; and for this, you should rely on the value returned by the "Send" API, not variables inside the Socket.

I am supposing you wrote your own applications; please, see how BulkSendApp and OnOffApp are handling the connection management. Anyway, can you also post snippet of your applications ?
Comment 2 l.salameh 2016-01-25 08:55:15 EST
> Uhm Uhm Uhm.
> 
> This is not directly related to the TCP socket, but instead how the
> application is handling the connection.
> 

Indeed, this bug involves how an application interacts with the TCP stack. 

> The connection should be closed only when all the data is transmitted into
> Tcp buffer; and for this, you should rely on the value returned by the
> "Send" API, not variables inside the Socket.
> 

In my code, which you will see below, I am attempting to close a connection only using the socket->Close() API, and I am not accessing any socket internal variables. The TcpSocketBase API checks to see whether all the data has been already transmitted in the TxBuffer, and if not, sets a variable to allow closing the socket in the future.

Unfortunately, in the scenario I described, the close API thinks there is still data to transmit, where there in fact is none. This inconsistency arises because of drops and retransmits, and the fact that m_nextTxSequence is not updated correctly before we call NotifySend.

My suggested fix is that we should update all the Tx Buffer variables in TcpSocketBase correctly BEFORE we call NotifySend, which will have side effects.

> I am supposing you wrote your own applications; please, see how BulkSendApp
> and OnOffApp are handling the connection management. Anyway, can you also
> post snippet of your applications ?

I have a snippet of my server code below:

I register a socket callback as follows:

void HTTP1_0ServerConnection::ConfigureSocketCallbacks(Ptr<Socket> socket)
{
  socket->SetCloseCallbacks (MakeCallback (&HTTP1_0ServerConnection::HandleRemoteClose, this),
                               MakeCallback (&HTTP1_0ServerConnection::HandleRemoteError, this));
    
  socket->SetSendCallback(MakeCallback(&HTTP1_0ServerConnection::HandleBytesAvailable, this));
  socket->SetRecvCallback (MakeCallback (&HTTP1_0ServerConnection::HandleReceive, this));
}

The server sends packets using the following function:

void HTTP1_0ServerConnection::SendPacket(Ptr<Packet> packet)
{
    m_currentResponse = packet;
    
    //Chunk the packet in order to fit it in the transmit buffer
    //If the size of the packet is larger than tx available, then split it
    uint32_t bytes = m_socket->GetTxAvailable();
    
    uint32_t size = packet->GetSize();
    uint32_t offset = min(size, bytes);
    
    NS_ASSERT(offset > 0);
    
    Ptr<Packet> responseFragment = m_currentResponse;
    
    if (offset != size){
      responseFragment = packet->CreateFragment(0, offset);
    }
    
    m_responseOffset = offset;
    
    m_socket->Send(responseFragment, 0);
}

Finally, the data sent callback which actually closes the connection once the data is sent is the following:

void HTTP1_0ServerConnection::HandleBytesAvailable(Ptr<Socket> socket, uint32_t bytesAvailable)
{
    //Make sure we are actually sending a response
    if (m_currentResponse != nullptr)
    {
      NS_ASSERT(m_responseOffset >= 0);
      
      //Create a packet with the remaining bytes
      
      uint32_t bytesRemaining = m_currentResponse->GetSize() - m_responseOffset;
      
      uint32_t packetSize = min(bytesRemaining, bytesAvailable);
      
      if (packetSize > 0){
        
        Ptr<Packet> response = m_currentResponse->CreateFragment(m_responseOffset, packetSize);
        m_responseOffset = m_responseOffset + packetSize;
        
        socket->Send(response, 0);
      }
      else{
        
        m_currentResponse = nullptr;
        m_responseOffset = 0;
        
        //If the connection is not keep alive, terminate it
        if (!m_keepAlive)
        {
          m_socket->Close();
          m_socket->SetRecvCallback(MakeNullCallback<void, Ptr<Socket>>());
        }
      }
    }
}

Hopefully, this clarifies why the bug is happening, if not, I will be happy to try to pare down my original code to a test case which will trigger the bug.
Comment 3 natale.patriciello 2016-01-26 05:40:28 EST
Created attachment 2254 [details]
Tcp close test

Hi,

I wrote this simple script to simulate the scenario you are describing. This is the output when you enable logging:

0s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [SYN] Seq=0 Ack=0 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionTS(0;0) size 36
0.5s TcpCloseTestSuite:Rx(): Receiver RX: 49153 > 4477 [SYN] Seq=0 Ack=0 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionTS(0;0) ns3::TcpOptionEnd(EOL) size 0
0.5s TcpCloseTestSuite:Tx(): Receiver TX: 4477 > 49153 [SYN|ACK] Seq=0 Ack=1 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionTS(500;0) size 36
1s TcpCloseTestSuite:Rx(): Sender RX: 4477 > 49153 [SYN|ACK] Seq=0 Ack=1 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionTS(500;0) ns3::TcpOptionEnd(EOL) size 0
1s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(1000;500) size 32
1.5s TcpCloseTestSuite:Rx(): Receiver RX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(1000;500) ns3::TcpOptionEnd(EOL) size 0
10s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(10000;500) size 732
10.001s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [ACK] Seq=701 Ack=1 Win=32768 ns3::TcpOptionTS(10001;500) size 732
10.002s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [FIN|ACK] Seq=1401 Ack=1 Win=32768 ns3::TcpOptionTS(10002;500) size 32
10.5s TcpCloseTestSuite:PktDropped(): Dropped 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(10000;500) ns3::TcpOptionEnd(EOL)
10.501s TcpCloseTestSuite:Rx(): Receiver RX: 49153 > 4477 [ACK] Seq=701 Ack=1 Win=32768 ns3::TcpOptionTS(10001;500) ns3::TcpOptionEnd(EOL) size 700
10.501s TcpCloseTestSuite:Tx(): Receiver TX: 4477 > 49153 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(10501;1000) size 32
10.502s TcpCloseTestSuite:Rx(): Receiver RX: 49153 > 4477 [FIN|ACK] Seq=1401 Ack=1 Win=32768 ns3::TcpOptionTS(10002;500) ns3::TcpOptionEnd(EOL) size 0
11.001s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(11001;500) size 1032
11.001s TcpCloseTestSuite:Rx(): Sender RX: 4477 > 49153 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(10501;1000) ns3::TcpOptionEnd(EOL) size 0
11.501s TcpCloseTestSuite:Rx(): Receiver RX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 ns3::TcpOptionTS(11001;500) ns3::TcpOptionEnd(EOL) size 1000
11.501s TcpCloseTestSuite:Tx(): Receiver TX: 4477 > 49153 [ACK] Seq=1 Ack=1402 Win=32768 ns3::TcpOptionTS(11501;11001) size 32
11.501s TcpCloseTestSuite:Tx(): Receiver TX: 4477 > 49153 [FIN|ACK] Seq=1 Ack=1402 Win=32768 ns3::TcpOptionTS(11501;11001) size 32
12.001s TcpCloseTestSuite:Rx(): Sender RX: 4477 > 49153 [ACK] Seq=1 Ack=1402 Win=32768 ns3::TcpOptionTS(11501;11001) ns3::TcpOptionEnd(EOL) size 0
12.001s TcpCloseTestSuite:Rx(): Sender RX: 4477 > 49153 [FIN|ACK] Seq=1 Ack=1402 Win=32768 ns3::TcpOptionTS(11501;11001) ns3::TcpOptionEnd(EOL) size 0
12.001s TcpCloseTestSuite:Tx(): Sender TX: 49153 > 4477 [ACK] Seq=1402 Ack=2 Win=32768 ns3::TcpOptionTS(12001;11501) size 32
12.501s TcpCloseTestSuite:Rx(): Receiver RX: 49153 > 4477 [ACK] Seq=1402 Ack=2 Win=32768 ns3::TcpOptionTS(12001;11501) ns3::TcpOptionEnd(EOL) size 0
PASS tcp-close 0.020 s


It drops the very first data packet; but everything works fine. This because moving the conditional statement that you pointed out, does not influence the test result, since m_nextTxSequence is incremented in SendDataPacket well before the packet is dropped. I.e:

Status: m_nextTxSeq = 1
-> App sends 700  byte
-> Tcp sends 700 byte + 32 header

Status: m_nextTxSeq = 701, incremented at the end of SendDataPacket
-> App sends 700 byte
-> Tcp sends 700 byte +  32 header
-> App finishes its data, call Close ()
-> Tcp sends a FIN

From your case, it seems that somewhere m_nextTxSeq is not incremented, but you send anyway some data out. This is the real bug, and not what happens in NewAck (and I think that if can be removed, since it is strange to receive an ACK for data we haven't transmitted yet).
Comment 4 l.salameh 2016-01-30 13:35:50 EST
Created attachment 2258 [details]
Modified tcp close test
Comment 5 l.salameh 2016-01-30 13:37:24 EST
Created attachment 2259 [details]
Modified tcp close test and general test
Comment 6 l.salameh 2016-01-30 13:43:48 EST
Hi,

I've attached a slightly modified tcp-general-test.cc and tcp-close-test.cc.

I've removed the socket->Close() from SendPacket in TcpGeneralTest.

Instead, I've added the SendCallback to the general test, and in the tcp close test, I try to close the socket in the send callback.

The result is the following:

Sender TX: 49153 > 4477 [SYN] Seq=0 Ack=0 Win=32768 ns3::TcpOptionWinScale(2) size 24
Receiver RX: 49153 > 4477 [SYN] Seq=0 Ack=0 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionEnd(EOL) size 0
Receiver TX: 4477 > 49153 [SYN|ACK] Seq=0 Ack=1 Win=32768 ns3::TcpOptionWinScale(2) size 24
Sender RX: 4477 > 49153 [SYN|ACK] Seq=0 Ack=1 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionEnd(EOL) size 0
Sender TX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 size 20
Receiver RX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 size 0
Sender TX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 size 1472
Sender TX: 49153 > 4477 [ACK] Seq=1453 Ack=1 Win=32768 size 1472
Dropped 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768
Receiver RX: 49153 > 4477 [ACK] Seq=1453 Ack=1 Win=32768 size 1452
Receiver TX: 4477 > 49153 [ACK] Seq=1 Ack=1 Win=32768 size 20
Sender RX: 4477 > 49153 [ACK] Seq=1 Ack=1 Win=32768 size 0
Sender TX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 size 1472
Receiver RX: 49153 > 4477 [ACK] Seq=1 Ack=1 Win=32768 size 1452
Receiver TX: 4477 > 49153 [ACK] Seq=1 Ack=2905 Win=32768 size 20
Sender RX: 4477 > 49153 [ACK] Seq=1 Ack=2905 Win=32768 size 0
FAIL tcp-close 0.040 s

Even with the fix I had suggested in my original bug description, I ran into this bug once again under a different scenario so my fix is not enough.

The basic problem is that:
Calling socket->Close() when the socket still has more data to send in its transmit buffer will set the m_closeOnEmpty. But it seems like when the socket's TCP connection is experiencing drops and retransmits, the socket will not attempt to close itself after it has recovered from drops and transmitted the remaining segments in its transmit buffer.

Once the m_closeOnEmpty = true, the tcp-socket-base code should always close the connection when the transmit buffer finally becomes empty.
Comment 7 natale.patriciello 2018-02-25 09:14:28 EST
Created attachment 3065 [details]
Updated test

With the latest changes, the test do not crash anymore.