Bug 341 - Get unexpected dropped packets when using SetSendCallback with heavy traffic
Get unexpected dropped packets when using SetSendCallback with heavy traffic
Status: CLOSED INVALID
Product: ns-3
Classification: Unclassified
Component: devices
ns-3-dev
All All
: P2 critical
Assigned To: Craig Dowell
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-12 14:19 EDT by evensky
Modified: 2012-03-17 19:40 EDT (History)
2 users (show)

See Also:


Attachments
sends data via PacketSocket from one node to another (9.34 KB, text/plain)
2008-09-12 14:28 EDT, evensky
Details
updated version, now prints out the size of the tx Queue before the Send. (9.94 KB, text/plain)
2008-09-15 17:18 EDT, evensky
Details
example of a failed run. (44.10 KB, text/plain)
2008-09-15 19:17 EDT, evensky
Details
Patch to add NetDevice::SetSendReadyCallback (2.27 KB, patch)
2008-09-24 17:17 EDT, evensky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description evensky 2008-09-12 14:19:39 EDT
I have a test program that when run as:
./p2p_test --size=10
....
0ns: node1: InjectPacket(0x7fff3dcf93f0,1000000000ns,A,10)
0ns: node1:inject frag:Payload (size=10)   10:A{10}
...
1000000000ns: node1: SendPacket(0x7fff3dcf93f0,0x6363b0)
1000000000ns: node1:forwarding:Payload (size=10)   10:A{10}
....
Send returned 0
1000000959ns: node2: HandleRead(0x7fff3dcf9370,0x635f10)
1000000959ns: node2:recv:Payload (size=10) left{0}    10:A{10}
1000000959ns: node2: Packet Delivered

but when I use
evensky@waltz:~/tools/networks/ns3toys/layer2$ ./p2p_test --size=10000000
...
1000000000ns: node1: SendPacket(0x7fffd9059750,0x63c9d0)
1000000000ns: node1:forwarding:Payload (size=50000)50000:A{50000}
m_socket{0x635450}->Send(packet): sock= 0x635450
Send returned 0
1000000000ns: node1: SendPacket(0x7fffd9059750,0x63cad0)
1000000000ns: node1:forwarding:Payload (size=50000)50000:A{50000}
m_socket{0x635450}->Send(packet): sock= 0x635450
Send returned -1
...

and it doesn't print out that the whole packet was delivered.

It seems to fail for size = 5050001
 >./p2p_test --size=5050000|fgrep -e '-1'
 >./p2p_test --size=5050001|fgrep -e '-1'
 Send returned -1

Test program is attached.
Comment 1 evensky 2008-09-12 14:28:48 EDT
Created attachment 253 [details]
sends data via PacketSocket from one node to another

This doesn't have the cleanest output, but I wanted to get it out. If its too hard to read, I can work on that. What it does is take a user specified size for a data stream and break it into FragSize (50000 bytes for ease of use) packets and sends that to the receiver. The receiver prints out the packets it gets and decrements a byte count that main set. When the count hits zero it prints out that the packet has been delivered. This should be the last thing printed.

If USE_HANDLESEND is defined, then we set a SendCallback in the sender. I try to only send whole packets to make life a little easier.

This test program generates the same output if USE_HANDLESEND is not defined.
Comment 2 evensky 2008-09-15 17:18:52 EDT
Created attachment 255 [details]
updated version, now prints out the size of the tx Queue before the Send.
Comment 3 evensky 2008-09-15 18:52:15 EDT
To clarify, what I expect to happen is that PacketSocket::Send will not return -1, and to the best of my knowledge, the number of items in the DropTail queue, contained in av, returned via side effect in node::GetDevice(0)GetAttribute("TxQueue",av) will not increase. NOTE: that the current version has the code that uses SetSendCallback #ifdef'ed out, so this is just using PacketQueue::Send.

What I am seeing is each call to SrcRtSwitch::SendPacket seems to add another item to the Queue, but nothing ever seems to take things off of it, so that after 100 Sends, the DropTail queue fills up and the Send in the above SendPacket method fails.
Comment 4 evensky 2008-09-15 19:17:56 EDT
Created attachment 256 [details]
example of a failed run.

To generate the error attached I did a:
./p2p_test --size=5050001 > failed.output

The uploaded file, failed.output, has the following text inserted in it to show where the error occurs. What I expected for correct operation was that this last byte would be written out and the 'Send returned -1' that is printed after the NOTE, would have written 'Send returned 0' as it had for previous Sends. 

NOTE: The line below is the first failed packet. Before we try to do a send around line 310 in SrcRtSwitch::SendPacket(), the queue has filled up. I don't think it should do that. When we do try the Send in SendPacket, it returns -1 indicating a failure.
Comment 5 Mathieu Lacage 2008-09-16 01:17:03 EDT
the following untested patch should fix your problem:
diff -r a60aeddab0fe src/node/packet-socket.cc
--- a/src/node/packet-socket.cc	Sun Sep 14 17:55:30 2008 -0700
+++ b/src/node/packet-socket.cc	Mon Sep 15 22:16:54 2008 -0700
@@ -26,6 +26,7 @@
 #include "ns3/packet.h"
 #include "ns3/uinteger.h"
 #include "ns3/trace-source-accessor.h"
+#include "ns3/simulator.h"
 
 #include <algorithm>
 
@@ -328,7 +329,8 @@ PacketSocket::SendTo (Ptr<Packet> p, uin
     }
   if (!error)
     {
-      NotifyDataSent (p->GetSize ());
+      Simulator::ScheduleNow (&PacketSocket::NotifyDataSent, this, p->GetSize ());
+      //NotifyDataSent (p->GetSize ());
     }
 
   if (error)
Comment 6 evensky 2008-09-16 13:14:28 EDT
Tried Mathieu's untested patch, but it didn't work for me.
Comment 7 Mathieu Lacage 2008-09-16 13:23:27 EDT
yes, it occured to me that it would not work. This is a pretty deep/tricky bug.
Comment 8 evensky 2008-09-24 17:17:45 EDT
Created attachment 262 [details]
Patch to add NetDevice::SetSendReadyCallback

As a starting point for a eventual soln at the Socket level. This patch adds a callback members NetDevice::m_sendReadyCallback, of type NetDevice::SendReadyCallback, and a method to set it, NetDevice::SetSendReadyCallback. If the callback is defined, it is invoked when a Packet is removed from the Queue. net-device.h has a stub definition, and PointToPointNetDevice has a defn that does something.
Comment 9 Faker Moatamri 2009-11-06 04:43:56 EST
(In reply to comment #8)
> Created an attachment (id=262) [details]
> Patch to add NetDevice::SetSendReadyCallback
> 
> As a starting point for a eventual soln at the Socket level. This patch adds a
> callback members NetDevice::m_sendReadyCallback, of type
> NetDevice::SendReadyCallback, and a method to set it,
> NetDevice::SetSendReadyCallback. If the callback is defined, it is invoked when
> a Packet is removed from the Queue. net-device.h has a stub definition, and
> PointToPointNetDevice has a defn that does something.
> 
Hi David,
Did you test this patch? does it work?
Best regards
Faker Moatamri
Comment 10 evensky 2009-11-06 14:08:50 EST
(In reply to comment #9)
> (In reply to comment #8)
> > Created an attachment (id=262) [details] [details]
> > Patch to add NetDevice::SetSendReadyCallback
> > 
> > As a starting point for a eventual soln at the Socket level. This patch adds a
> > callback members NetDevice::m_sendReadyCallback, of type
> > NetDevice::SendReadyCallback, and a method to set it,
> > NetDevice::SetSendReadyCallback. If the callback is defined, it is invoked when
> > a Packet is removed from the Queue. net-device.h has a stub definition, and
> > PointToPointNetDevice has a defn that does something.
> > 
> Hi David,
> Did you test this patch? does it work?
> Best regards
> Faker Moatamri

Faker,

This was a while ago, and I just don't recall. We're not using ns3 anymore (ns3 is great, but our problem turned out not to be a good match for ns3), so I haven't been active in ns3 for a while. I am certain that If I had posted it to the world I would have tested it. So it probably does what it says, but I recall that it wasn't the eventual solution we were shooting for.

\dae


Comment 11 Craig Dowell 2010-02-26 19:15:42 EST
It's been a few days since this bug was reported, so I thought I would take a look at it :-)

I may be missing something, but as far as I can make out, the system is working as expected here.

What happens is that at time 0, InjectPaket is called, which splits a gigantic transfer up into 50,000 byte chuunks.  Some number of Packets are created and a SendPacket is scheduled for each of these packets, at a time separated by one nanosecond.  Each of these SendPacket calls is executed at the expected time.

In the SendPacket event, if HANDLESEND is not defined, a socket send is performed for each chunk.  Since the link is running at 10 megabits per second, and 50,000 byte packets are being scheduled every nanosecond, the underlying queue in the point-to-point net device eventually fills up and overflows.  

The first packet is immediately pulled off the transmit queue, so it overflows when one hundred two 50,000 byte packets are queued; so if you set size = 50000 * 101 = 5,050,000 it works, but size >= 5,050,001 causes a buffer overflow.  When the point-to-point transmit queye overflows, packets are dropped; and this is completely expected.

If HANDLESEND is defined, in SendPacket a check for m_socket->GetTxAvailable is done before trying to send bits.  This call always returns the MTU of the link, and so always succeeds since MTU is set to 50,000 bytes (the chunk size).  This check doesn't accomplish anything since when using PacketSockets, m_socket->GetTxAvailable returns MTU.  This is because there is not necessarily a way to get at a transmit queue in a device that may or may not exist (will not exist for Wifi0.  This is probably a documentation issue in the packet socket code (there is no Doxygen for any of the PacketSocket methods :-)

If HANDLESEND is defined in the test code, it expects GetTxAvailable to return something meaningful if the point-to-point device transmit queue starts filling up.  Let's assume for a moment that it does.  The test code tries to use SetSendCallback as if it were working as it would in a TCP socket -- as if there were some kind of backpressure in the system -- there isn't.  

In a TCP socket, the send callback is called when data is ACKed.  In a PacketSocket socket, the send callback is called when data is sent to the net device.  So even if GetTxAvailable were returning some value less than MTU, there is no inter-layer mechanism available to regulate backpressure, and therefore no backpressure.

It would be possible to plumb some backpressure mechanism into the stack for UDP, but I think that would be unusual and would definitely be a new feature.

So, the bottom line is:  I believe the system is working as designed and this bug is invalid.  We could clear up some issues with documentation if all of this isn't intuitively obvious; but I think this should be closed and marked invalid.
Comment 12 Craig Dowell 2010-03-09 13:14:11 EST
No response, so I'm closing this one,