Bugzilla – Full Text Bug Listing |
Summary: | Hidden node scenario leads to ARP failure | ||
---|---|---|---|
Product: | ns-3 | Reporter: | Tom Henderson <tomh> |
Component: | internet | Assignee: | George Riley <riley> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | john.abraham.in, nikkipui, ns-bugs, tommaso.pecorella |
Priority: | P5 | ||
Version: | pre-release | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | |||
Bug Blocks: | 1190, 1606, 1629 | ||
Attachments: |
A simulation that shows the problem.
Receive rate (Fixed Arp vs Random Arp) Introduce jitter patch chart comparison with patch Diff with UniformVariable as StringValue and IPv6 solicitation jitter Patch with minor modification to ipv6 Sample program for testing phy rx drop with ipv6 Docs and helper functions |
Description
Tom Henderson
2012-11-06 08:52:24 EST
Created attachment 1484 [details]
A simulation that shows the problem.
From my understanding of the linux 2.6 implementation of neighbor.c, there is no randomness explicitly added during the retry of ARP requests. http://lxr.free-electrons.com/source/net/core/neighbour.c 931 } else { 932 /* NUD_PROBE|NUD_INCOMPLETE */ 933 next = now + neigh->parms->retrans_time; 934 } Such a scenario is not reproducible in the real-world due to randomness at layers not directly related layer 2 such as 1. time at which the application in a node starts traffic 2. current processing state of a node (the time it takes for a node to actually request the kernel protocol module is a function of the CPU load at that time). I wonder if randomness introduced at application sources helps in this simulation. Created attachment 1626 [details]
Receive rate (Fixed Arp vs Random Arp)
After refreshing my memory on this case: Here randomizing the start time of the application sources won't help. The ARP collision here is because 2 nodes B and C received RREQ at the same time from A (which is possible due to speed of light propagation delay). A wants the route to D. B and C both have routes to D and want to reply to A via an unicast RREP. Both B and C initiate ARP bcast, which keeps colliding at A. ARP retries by B and C are scheduled at exactly the same time, and the collisions continue. Due to this perfect synchronization, Carrier Sense may not work, as each node wait for exactly the same DIFS, starting their timers at the same time. 1. In the real world "random node processing delay" can help bypass these collisions. 2. The next question is where do we put this processing delay ? a. Before the frame enters the channel or getting a lock for the network adapter? b. Before the routing layers requests the arp layer/neighbor management module? c. Delay while a node searches its extensive arp cache? d. Delay in the interface between user and kernel space modules. So, I picked one and tried to introduce a random (1..10 ms) delay between arp retries, I found that: manet-routing-compare, shows a marginal improvement in average throughput (9.085490196 for fixed ARP) , (9.276235294 for random ARP). See also attached picture What this does prove is that 1. the throughput for the manet-routing-compare example is impacted by arp collision. It does not prove: 1. Random ARP retry improves the throughput. It may /may not. In any case, I think we need a model for node-processing delay, somewhere. How did ns-2 deal with this? Created attachment 1628 [details]
Introduce jitter patch
Before patch:
John-Abrahams-MacBook-Pro:ns-3-dev remote$ export NS_LOG=UdpEchoServerApplication
John-Abrahams-MacBook-Pro:ns-3-dev remote$ ./waf --run test-arp &>a.out
John-Abrahams-MacBook-Pro:ns-3-dev remote$ grep Echoing a.out | wc -l
0
manet-routing-compare received packets:
olsr: 2602
aodv: 1822
dsdv: 1784
dsr: 2872
After patch:
John-Abrahams-MacBook-Pro:ns-3-dev remote$ export NS_LOG=UdpEchoServerApplication
John-Abrahams-MacBook-Pro:ns-3-dev remote$ ./waf --run "test-arp" &> a.out
John-Abrahams-MacBook-Pro:ns-3-dev remote$ grep Echoing a.out | wc -l
100
manet-routing-compare received packets:
olsr: 2650
aodv: 2040
dsdv: 1958
dsr: 3486
Description of patch:
ArpL3Protocol does not immediately send an ARP request. It will wait for a duration defined
by a uniform random-variable between 0 and m_maxRequestJitter.
This should provide the desired node-processing delay.
Created attachment 1629 [details]
chart comparison with patch
A couple of comments on the patch. If you instead code the attribute as a StringValue like this (example drawn from random-waypoint-mobility-model.cc): .AddAttribute ("Speed", "A random variable used to pick the speed of a random waypoint model.", StringValue ("ns3::UniformRandomVariable[Min=0.3|Max=0.7]"), MakePointerAccessor (&RandomWaypointMobilityModel::m_speed), MakePointerChecker<RandomVariableStream> ()) you will be able to access not only the time value but also the stream number in the config-store system, e.g.: value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Min "0" value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Max "20" value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Stream "2" value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Antithetic "false" Also, we should fix IPv6 neighbor solicitation likewise. I agree in general with trying to add this variability, but we'll have to document this prominently as changed behavior and it will perturb the few trace-based tests that we have. Created attachment 1630 [details]
Diff with UniformVariable as StringValue and IPv6 solicitation jitter
Diff with UniformVariable as StringValue and IPv6 solicitation jitter.
I had to create a function "DelayedSendMessage" due to overload ambiguity when using function pointer.
The protection here for ipv6 is for solicitations which use multicasts.
Any suggestions on how to test the ipv6 patch? AODV only supports ipv4
I guess that the patch can be checked without a routing protocol like AODV. StaticRouting do RS/RA and NS/NA as well. Simply using examples/ipv6/icmpv6-redirect will give you all the messages. Probably you'll need to add some more nodes to see "collisions", tho. (In reply to comment #9) > I guess that the patch can be checked without a routing protocol like AODV. > StaticRouting do RS/RA and NS/NA as well. > > Simply using examples/ipv6/icmpv6-redirect will give you all the messages. > Probably you'll need to add some more nodes to see "collisions", tho. I have a wifi simulation which uses ipv6, unfortunately the ipv6 patch improves the situation in one case and degrades it another case. Needs further test on ipv6 part (In reply to comment #10) > (In reply to comment #9) > > I guess that the patch can be checked without a routing protocol like AODV. > > StaticRouting do RS/RA and NS/NA as well. > > > > Simply using examples/ipv6/icmpv6-redirect will give you all the messages. > > Probably you'll need to add some more nodes to see "collisions", tho. > > I have a wifi simulation which uses ipv6, unfortunately the ipv6 patch improves > the situation in one case and degrades it another case. > Needs further test on ipv6 part I have changed the ipv6 part of the patch to add some logs I have an adapted version of wifi-hidden-terminal for ipv6 using only EchoClient/EchoServer The drop at Wifi Phy With patch RxDrop Count : 640 Without patch RxDrop Count : 669 I have attached a new patch and sample program. But I feel more tests need to be done. And I don't normally work with ipv6 Created attachment 1657 [details]
Patch with minor modification to ipv6
Created attachment 1658 [details]
Sample program for testing phy rx drop with ipv6
Hi John, I checked the IPv6's part and all seems good, so feel free to go on. Just some suggestions. The first one is about the jitter delay attribute naming and placement. I'd rather have the same name and the same placement, i.e., to have the jitter in ipv4-l3-protocol rather than in arp-l3-protocol. Having equivalent stuff in different places is confusing. The second one is more a conceptual thing. This bug (and possibly others) are all related to how nodes come to life. The *real* solution would be to have them starting up with a jitter, along with their applications. A possible API would be: - a global parameter to set if the nodes are started with a random jitter and how much it is. - Application might be started with the same idea: apps.Start( Seconds() ) paired with apps.Start (random stream) The first one would start *all* the apps at the same time (precise start). The second would start the apps with a random uniform variable in [min,max). The apps-related thing could be done also now. The node-wide thing is way more complex, as IP (v4 and v6) are missing a "start". They do start their operations as soon as an address is available for them, so basically right away. Maybe the "Start-Stop" feature could enable such a thing, but I'm not that sure. Anyway, as a temporary fix, this looks fine. Forgot one thing. Please separate this patch at least in 2 different patches: 1) AODV-related things 2) IPv4 and IPv6 jitters Ty. Hi Tommaso, Thats right, solving the node-start/stop issue is a better solution. However, since that discussion is still ongoing, we could have a temporary fix. For now, I left a note on Bug1197. -john (In reply to comment #14) > Hi John, > > I checked the IPv6's part and all seems good, so feel free to go on. > > Just some suggestions. > > The first one is about the jitter delay attribute naming and placement. I'd > rather have the same name and the same placement, i.e., to have the jitter in > ipv4-l3-protocol rather than in arp-l3-protocol. Having equivalent stuff in > different places is confusing. > > The second one is more a conceptual thing. > > This bug (and possibly others) are all related to how nodes come to life. The > *real* solution would be to have them starting up with a jitter, along with > their applications. > > A possible API would be: > - a global parameter to set if the nodes are started with a random jitter and > how much it is. > - Application might be started with the same idea: apps.Start( Seconds() ) > paired with apps.Start (random stream) > > The first one would start *all* the apps at the same time (precise start). > The second would start the apps with a random uniform variable in [min,max). > > The apps-related thing could be done also now. > The node-wide thing is way more complex, as IP (v4 and v6) are missing a > "start". They do start their operations as soon as an address is available for > them, so basically right away. Maybe the "Start-Stop" feature could enable such > a thing, but I'm not that sure. > > Anyway, as a temporary fix, this looks fine. Hi John, I've seen the patch has been pushed. However I really don't like too much having two different things in different places (w.r.t. IPv4 and IPv6). Moreover if we keep it like this for 3.18, it will be harder to change it in the future (it would become an API change). Can we try to quickly fix this ? Also, we should document how to revert to the "original" system, for experiment reproducibility and add a huge note on the documentation that something did change in the way ARP, NS and RS are generated. I'll try to work on moving the streams attributes to IPv4 and IPv6 base classes, or, at least, to the InternetStackHelper. However, I'm quite busy tomorrow, so... the first coming with a better solution will win. Cheers, T. (In reply to comment #17) > Hi John, > > I've seen the patch has been pushed. However I really don't like too much > having two different things in different places (w.r.t. IPv4 and IPv6). > Moreover if we keep it like this for 3.18, it will be harder to change it in > the future (it would become an API change). > > Can we try to quickly fix this ? > > Also, we should document how to revert to the "original" system, for experiment > reproducibility and add a huge note on the documentation that something did > change in the way ARP, NS and RS are generated. > > I'll try to work on moving the streams attributes to IPv4 and IPv6 base > classes, or, at least, to the InternetStackHelper. However, I'm quite busy > tomorrow, so... the first coming with a better solution will win. > > Cheers, > > T. Hi Tommaso, Not sure if I have time for this at this point, the best I can do is back out the changes. -john Hi John, don't backout. I'll handle this. It's just a matter of some documentation and a helper. Cheers, T. (In reply to comment #19) > Hi John, > > don't backout. I'll handle this. It's just a matter of some documentation and a > helper. > > Cheers, > > T. Hi Tommaso, In that case, can you clarify specifically what you mean by (can you state the code, now that it is already checked-in) However I really don't like too much > having two different things in different places (w.r.t. IPv4 and IPv6). > Moreover if we keep it like this for 3.18, it will be harder to change it in > the future (it would become an API change). From what I see it is similar in the internet stack helper 1.15 + Ptr<Ipv4> ipv4 = node->GetObject<Ipv4> (); 1.16 + if (ipv4 != 0) 1.17 + { 1.18 + Ptr<ArpL3Protocol> arpL3Protocol = ipv4->GetObject<ArpL3Protocol> (); 1.19 + if (arpL3Protocol != 0) 1.20 + { 1.21 + currentStream += arpL3Protocol->AssignStreams (currentStream); 1.22 + } 1.23 + } 1.24 + Ptr<Ipv6> ipv6 = node->GetObject<Ipv6> (); 1.25 + if (ipv6 != 0) 1.26 + { 1.27 + Ptr<Icmpv6L4Protocol> icmpv6L4Protocol = ipv6->GetObject<Icmpv6L4Protocol> (); 1.28 + if (icmpv6L4Protocol != 0) 1.29 + { 1.30 + currentStream += icmpv6L4Protocol->AssignStreams (currentStream); 1.31 + } 1.32 + } Besides, the similarities between ipv6 and ipv4 are blurred when it comes to neighbor discovery. The protection I am trying to add is specifically for broadcasts and multicasts. 2. To restore the old functionality we could do Config::SetDefault ("ns3::Icmpv6L4Protocol::SolicitationJitter", StringValue ("ns3::UniformRandomVariable[Min=0.0|Max=0.0]")); Config::SetDefault ("ns3::ArpL3Protocol::RequestJitter", StringValue ("ns3::UniformRandomVariable[Min=0.0|Max=0.0]")); 3. I will certainly look at the documentation, once your modifications are in. 4. We could also turn off ipv6 solicitation jitter, until ipv6 and ipv4 in general are looking similar in all areas Hi John, don't worry, IPv4 and IPv6 will never look the same (thanksfully). I meant to have the attributes in Ipv4 and Ipv6 classes. However you're probably right, it's not worth the effort. The user should know the protocols enough to figure out why the attributes are there. Anyway, I did a quick patch to add what I think it's necessary. I checked for IPv6 and it seems to do the job. Feel free to apply it and close the bug. Cheers, T. Created attachment 1662 [details]
Docs and helper functions
Thanks I pushed it , author John Abraham <john.abraham.in@gmail.com> Thu, 15 Aug 2013 09:50:34 -0700 (12 hours ago) changeset 10151 5d5dd1b12bfc author John Abraham <john.abraham.in@gmail.com> Thu, 15 Aug 2013 21:00:57 -0700 (80 minutes ago) changeset 10160 5bf2fe645784 (In reply to comment #22) > Created attachment 1662 [details] > Docs and helper functions |