Bugzilla – Bug 2141
TCP data sent callback is not called for certain send events
Last modified: 2015-12-17 06:40:39 EST
Created attachment 2067 [details] Packet capture file for the TCP flow exhibiting the data sent callback bug I am observing a condition where a TcpNewReno socket does not report all of the data sent using the SetDataSentCallback() feature. The callback works as it should almost all of the time, but we have observed one simulation run where one of these callbacks did not occur. This issue is occurring with ns-3 3.23. Here is our setup. We have a custom NetDevice class that utilizes TCP sockets for tunneling data, and this class monitors the amount of data queued on the send side of each TCP socket. It computes the amount of data queued by keeping track of the amount of data passed into the TCP socket Send() method minus the amount of data sent as reported via the SetDataSentCallback() callback. Our NetDevice subclass limits how large this queue size can be, and stops sending if too much data is queued up. In the simulation run in question, one of the NetDevices had delivered all of its data to the NetDevice at the other end of the TCP connection, yet the queue size measurement was stuck at a value greater than zero. In fact, it was large enough that our NetDevice code stopped sending any more data into the TCP socket. This bug occurs as part of a large and complex simulation, and has only ever been observed happening once. Thus, I am unable to provide a simple test case that reproduces the bug. However, the issue was repeatable when run on one of our simulation computers, and I was able to instrument the code and figure out what was happening. What follows is a summary of what I found. During the period of interest, the TCP m_txBuffer has the following user data in it (sequence numbers shown at the bottom): +------------+--------------+--------------+--------------+ | 40 bytes | 1052 bytes | 1052 bytes | 1052 bytes | +------------+--------------+--------------+--------------+ ^ ^ ^ ^ ^ 72697 72737 73789 74841 75893 Here is a summary of what happens at various times. My instrumentation was not all-inclusive, so I do not have some of the exact event times, but all of the events that were captured are listed in the sequence that they occurred. Note that the TCP connection is already established and has been sending packets for a short time before these events. I am including the resulting m_txBuffer head and tail values, as well as the resulting TCP m_nextTxSeq and m_highTxMark values, after the event is over. I also marked the transmissions that resulted in a "data sent callback" as "DSCB". Time Events Head Tail nextTxSeq highTxMark ------ -------------------------------- ----- ----- --------- ---------- 3.0303 Queue 72697 (40 bytes) 67437 72737 69541 70593 3.03351 Recv ACK 69541, 69541 72737 72697 72697 Retrans 69541 (1052 bytes), Send 70593 (1052 bytes) DSCB, Send 71645 (1052 bytes) DSCB 3.03859 Queue 72737 (1052 bytes) 69541 73789 72697 72697 3.05121 Recv ACK 70593, 70593 73789 73789 73789 Send 72697 (1052 b) DSCB, Send 73749 (40 b) DSCB 3.05121 Queue 73789 (1052 bytes) 70593 74841 73789 73789 3.05628 Queue 74841 (1052 bytes) 70593 75893 73789 73789 3.????? Retrans 70593 (1052 bytes) 70593 75893 73789 73789 3.????? Retrans 70593 (1052 bytes) 70593 75893 71645 73789 3.????? Recv ACK 72697, 72697 75893 74801 74801 Retrans 72697 (1052 bytes), Retrans 73749 (1052 bytes) **** 3.10437 Recv ACK 73789, 73789 75893 75893 75893 Send 74801 (1052 bytes) DSCB, Send 75853 (40 bytes) DSCB 3.????? Retrans 73789 (1052 bytes) In the event marked with "****", the packet is a retransmission of seq 73749 through 73788 (40 bytes), and the original transmission of seq 73789 through 74800 (1012 bytes). However, the data sent callback is not called for the transmission of 1012 bytes. Attached is a pcap file for the TCP flow. In viewing this file in wireshark, I noticed that the time stamps are all a bit early compared to the times listed above. I'm not sure why this is. Packet number 181 in this trace is the packet that contains data that should be reported to the data sent callback.
I have found the code that is causing this issue. It is in the tcp-socket-base.cc file, in the TcpSocketBase::SendDataPacket() method. On line 2068, the conditional test is not sufficient for cases where a packet sent includes both retransmission bytes and original transmission bytes. Likewise, the number of bytes passed to the callback on line 2070 needs to be updated. I have made a fix for this issue and tested is using the problematic simulation. With this fix applied, the data sent callbacks are all occurring like they should be. I am attaching a patch for this file.
Created attachment 2068 [details] Proposed patch for TCP data sent callback bug
Thanks for a detailed bug report and the patch. We should also create a test suite for this callback operation (over a lossy channel) before closing this bug.
Created attachment 2210 [details] Updated patch Hi, I have investigated this bug and I don't understand why you use such a conditional statement. Wouldn't that be enough? See my patch. (Test case is coming)
Created attachment 2211 [details] Proposed test case This is the testcase: we run various transmission, and check if at the end the amount of data notified is the same than the amount of data transmitted at application level. To simulate your environment, I used a TcpSocket that, the 50% of the time, saves in its buffer only the 50% of the packet, forcing on one packet the situation you described. Actual ns-3-dev fails; with any of our patches, it does not raise any error.
Created attachment 2212 [details] One-file patch for test case and bug correction I've checked. Your condition is the following: (seq + sz > highTxMark && highTxMark >= seq) I advocate that the latter part is not necessary, since highTxMark is always >= seq (or we have holes in the transmission.. For instance, if highTxMark < seq, it means that we have not transmitted the piece from highTxMark to seq). I'm pushing these after checking that other tests does not fail, adding you as contributor. Thanks!
Fixed in 11789:fbe8b99cb680