Bug 1522

Summary: Hidden node scenario leads to ARP failure
Product: ns-3 Reporter: Tom Henderson <tomh>
Component: internetAssignee: George Riley <riley>
Status: RESOLVED FIXED    
Severity: normal CC: john.abraham.in, nikkipui, ns-bugs, tommaso.pecorella
Priority: P5    
Version: pre-release   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 1190, 1606, 1629    
Attachments: A simulation that shows the problem.
Receive rate (Fixed Arp vs Random Arp)
Introduce jitter patch
chart comparison with patch
Diff with UniformVariable as StringValue and IPv6 solicitation jitter
Patch with minor modification to ipv6
Sample program for testing phy rx drop with ipv6
Docs and helper functions

Description Tom Henderson 2012-11-06 08:52:24 EST
Reported by John Abraham:
https://groups.google.com/forum/#!msg/ns-3-users/rQWvmME9v04/d8WjJKu5u_4J
Comment 1 Daniel L. 2012-12-07 15:11:49 EST
Created attachment 1484 [details]
A simulation that shows the problem.
Comment 2 John Abraham 2013-07-08 18:34:18 EDT
From my understanding of the linux 2.6 implementation of neighbor.c, there is no randomness explicitly added during the retry of ARP requests.


http://lxr.free-electrons.com/source/net/core/neighbour.c


931         } else {
932                 /* NUD_PROBE|NUD_INCOMPLETE */
933                 next = now + neigh->parms->retrans_time;
934         }

Such a scenario is not reproducible in the real-world due to randomness at layers not directly related layer 2 such as
1. time at which the application in a node starts traffic
2. current processing state of a node (the time it takes for a node to actually request the kernel protocol module is a function of the CPU load at that time).

I wonder if randomness introduced at application sources helps in this simulation.
Comment 3 John Abraham 2013-07-09 12:13:38 EDT
Created attachment 1626 [details]
Receive rate (Fixed Arp vs Random Arp)
Comment 4 John Abraham 2013-07-09 12:14:11 EDT
After refreshing my memory on this case:

Here randomizing the start time of the application sources won't help. The ARP collision here is because 2 nodes B and C received RREQ at the same time from A (which is possible due to speed of light propagation delay). 
A wants the route to D.
B and C both have routes to D and want to reply to A via an unicast RREP. Both B and C initiate ARP  bcast, which keeps colliding at A. ARP retries by B and C are scheduled at exactly the same time, and the collisions continue.

Due to this perfect synchronization, Carrier Sense may not work, as each node wait for exactly the same DIFS, starting their timers at the same time.

1. In the real world "random node processing delay" can help bypass these collisions.
2.  The next question is where do we put this processing delay ?
      a. Before the frame enters the channel or getting a lock for the network adapter?
      b. Before the routing layers requests the arp layer/neighbor management module?
      c. Delay while a node searches its extensive arp cache?
      d. Delay in the interface between user and kernel space modules.


So, I picked one and  tried to introduce a random (1..10 ms) delay between arp retries, I found that:
manet-routing-compare, shows a marginal improvement in average throughput (9.085490196 for fixed ARP) , (9.276235294 for random ARP).
See also attached picture

What this does prove is that
1. the throughput for the manet-routing-compare example is impacted by arp collision.

It does not prove:
1. Random ARP retry improves the throughput. It may /may not.

In any case, I think we need a model for node-processing delay, somewhere.
How did ns-2 deal with this?
Comment 5 John Abraham 2013-07-11 19:06:16 EDT
Created attachment 1628 [details]
Introduce jitter patch

Before patch:

John-Abrahams-MacBook-Pro:ns-3-dev remote$ export NS_LOG=UdpEchoServerApplication
John-Abrahams-MacBook-Pro:ns-3-dev remote$ ./waf --run test-arp &>a.out
John-Abrahams-MacBook-Pro:ns-3-dev remote$ grep Echoing a.out | wc -l
       0

manet-routing-compare received packets:
olsr: 2602
aodv: 1822
dsdv: 1784
dsr:  2872



After patch:


John-Abrahams-MacBook-Pro:ns-3-dev remote$ export NS_LOG=UdpEchoServerApplication
John-Abrahams-MacBook-Pro:ns-3-dev remote$ ./waf --run "test-arp" &> a.out
John-Abrahams-MacBook-Pro:ns-3-dev remote$ grep Echoing a.out | wc -l
     100


manet-routing-compare received packets:
olsr: 2650
aodv: 2040
dsdv: 1958
dsr:  3486




Description of patch:
ArpL3Protocol does not immediately send an ARP request. It will wait for a duration defined
by a uniform random-variable between 0 and m_maxRequestJitter.
This should provide the desired node-processing delay.
Comment 6 John Abraham 2013-07-11 21:54:28 EDT
Created attachment 1629 [details]
chart comparison with patch
Comment 7 Tom Henderson 2013-07-12 00:31:46 EDT
A couple of comments on the patch.

If you instead code the attribute as a StringValue like this (example drawn from random-waypoint-mobility-model.cc):

    .AddAttribute ("Speed",
                   "A random variable used to pick the speed of a random waypoint model.",
                   StringValue ("ns3::UniformRandomVariable[Min=0.3|Max=0.7]"),
                   MakePointerAccessor (&RandomWaypointMobilityModel::m_speed),
                   MakePointerChecker<RandomVariableStream> ())

you will be able to access not only the time value but also the stream number in the config-store system, e.g.:

value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Min "0"
value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Max "20"
value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Stream "2"
value /$ns3::NodeListPriv/NodeList/0/$ns3::Node/$ns3::RandomWaypointMobilityModel/Speed/$ns3::UniformRandomVariable/Antithetic "false"

Also, we should fix IPv6 neighbor solicitation likewise.

I agree in general with trying to add this variability, but we'll have to document this prominently as changed behavior and it will perturb the few trace-based tests that we have.
Comment 8 John Abraham 2013-07-12 04:39:27 EDT
Created attachment 1630 [details]
Diff with UniformVariable as StringValue and IPv6 solicitation jitter

Diff with UniformVariable as StringValue and IPv6 solicitation jitter.

I had to create a function "DelayedSendMessage" due to overload ambiguity when using function pointer.
The protection here for ipv6 is for solicitations which use multicasts.

Any suggestions on how to test the ipv6 patch? AODV only supports ipv4
Comment 9 Tommaso Pecorella 2013-07-16 14:04:27 EDT
I guess that the patch can be checked without a routing protocol like AODV. StaticRouting do RS/RA and NS/NA as well.

Simply using examples/ipv6/icmpv6-redirect will give you all the messages. Probably you'll need to add some more nodes to see "collisions", tho.
Comment 10 John Abraham 2013-08-12 00:03:12 EDT
(In reply to comment #9)
> I guess that the patch can be checked without a routing protocol like AODV.
> StaticRouting do RS/RA and NS/NA as well.
> 
> Simply using examples/ipv6/icmpv6-redirect will give you all the messages.
> Probably you'll need to add some more nodes to see "collisions", tho.

I have a wifi simulation which uses ipv6, unfortunately the ipv6 patch improves the situation in one case and degrades it another case.
Needs further test on ipv6 part
Comment 11 John Abraham 2013-08-12 13:02:00 EDT
(In reply to comment #10)
> (In reply to comment #9)
> > I guess that the patch can be checked without a routing protocol like AODV.
> > StaticRouting do RS/RA and NS/NA as well.
> > 
> > Simply using examples/ipv6/icmpv6-redirect will give you all the messages.
> > Probably you'll need to add some more nodes to see "collisions", tho.
> 
> I have a wifi simulation which uses ipv6, unfortunately the ipv6 patch improves
> the situation in one case and degrades it another case.
> Needs further test on ipv6 part




I have changed the ipv6 part of the patch to add some logs
I have an adapted version of wifi-hidden-terminal for ipv6 using only EchoClient/EchoServer

The drop at Wifi Phy
With patch
RxDrop Count : 640


Without patch
RxDrop Count : 669

I have attached a new patch and sample program.

But I feel more tests need to be done. And I don't normally work with ipv6
Comment 12 John Abraham 2013-08-12 13:03:02 EDT
Created attachment 1657 [details]
Patch with minor modification to ipv6
Comment 13 John Abraham 2013-08-12 13:03:54 EDT
Created attachment 1658 [details]
Sample program for testing phy rx drop with ipv6
Comment 14 Tommaso Pecorella 2013-08-15 04:41:20 EDT
Hi John,

I checked the IPv6's part and all seems good, so feel free to go on.

Just some suggestions.

The first one is about the jitter delay attribute naming and placement. I'd rather have the same name and the same placement, i.e., to have the jitter in ipv4-l3-protocol rather than in arp-l3-protocol. Having equivalent stuff in different places is confusing.

The second one is more a conceptual thing.

This bug (and possibly others) are all related to how nodes come to life. The *real* solution would be to have them starting up with a jitter, along with their applications.

A possible API would be:
- a global parameter to set if the nodes are started with a random jitter and how much it is.
- Application might be started with the same idea: apps.Start( Seconds() ) paired with apps.Start (random stream)

The first one would start *all* the apps at the same time (precise start).
The second would start the apps with a random uniform variable in [min,max).

The apps-related thing could be done also now.
The node-wide thing is way more complex, as IP (v4 and v6) are missing a "start". They do start their operations as soon as an address is available for them, so basically right away. Maybe the "Start-Stop" feature could enable such a thing, but I'm not that sure.

Anyway, as a temporary fix, this looks fine.
Comment 15 Tommaso Pecorella 2013-08-15 04:42:54 EDT
Forgot one thing.

Please separate this patch at least in 2 different patches:
1) AODV-related things
2) IPv4 and IPv6 jitters

Ty.
Comment 16 John Abraham 2013-08-15 10:25:01 EDT
Hi Tommaso,
Thats right, solving the node-start/stop issue is a better solution.
However, since that discussion is still ongoing, we could have a temporary fix.
For now, I left a note on Bug1197.

-john


(In reply to comment #14)
> Hi John,
> 
> I checked the IPv6's part and all seems good, so feel free to go on.
> 
> Just some suggestions.
> 
> The first one is about the jitter delay attribute naming and placement. I'd
> rather have the same name and the same placement, i.e., to have the jitter in
> ipv4-l3-protocol rather than in arp-l3-protocol. Having equivalent stuff in
> different places is confusing.
> 
> The second one is more a conceptual thing.
> 
> This bug (and possibly others) are all related to how nodes come to life. The
> *real* solution would be to have them starting up with a jitter, along with
> their applications.
> 
> A possible API would be:
> - a global parameter to set if the nodes are started with a random jitter and
> how much it is.
> - Application might be started with the same idea: apps.Start( Seconds() )
> paired with apps.Start (random stream)
> 
> The first one would start *all* the apps at the same time (precise start).
> The second would start the apps with a random uniform variable in [min,max).
> 
> The apps-related thing could be done also now.
> The node-wide thing is way more complex, as IP (v4 and v6) are missing a
> "start". They do start their operations as soon as an address is available for
> them, so basically right away. Maybe the "Start-Stop" feature could enable such
> a thing, but I'm not that sure.
> 
> Anyway, as a temporary fix, this looks fine.
Comment 17 Tommaso Pecorella 2013-08-15 17:31:55 EDT
Hi John,

I've seen the patch has been pushed. However I really don't like too much having two different things in different places (w.r.t. IPv4 and IPv6). Moreover if we keep it like this for 3.18, it will be harder to change it in the future (it would become an API change).

Can we try to quickly fix this ?

Also, we should document how to revert to the "original" system, for experiment reproducibility and add a huge note on the documentation that something did change in the way ARP, NS and RS are generated.

I'll try to work on moving the streams attributes to IPv4 and IPv6 base classes, or, at least, to the InternetStackHelper. However, I'm quite busy tomorrow, so... the first coming with a better solution will win.

Cheers,

T.
Comment 18 John Abraham 2013-08-15 17:45:12 EDT
(In reply to comment #17)
> Hi John,
> 
> I've seen the patch has been pushed. However I really don't like too much
> having two different things in different places (w.r.t. IPv4 and IPv6).
> Moreover if we keep it like this for 3.18, it will be harder to change it in
> the future (it would become an API change).
> 
> Can we try to quickly fix this ?
> 
> Also, we should document how to revert to the "original" system, for experiment
> reproducibility and add a huge note on the documentation that something did
> change in the way ARP, NS and RS are generated.
> 
> I'll try to work on moving the streams attributes to IPv4 and IPv6 base
> classes, or, at least, to the InternetStackHelper. However, I'm quite busy
> tomorrow, so... the first coming with a better solution will win.
> 
> Cheers,
> 
> T.

Hi Tommaso,
Not sure if I have time for this at this point, the best I can do is back out the changes. 
-john
Comment 19 Tommaso Pecorella 2013-08-15 19:08:50 EDT
Hi John,

don't backout. I'll handle this. It's just a matter of some documentation and a helper.

Cheers,

T.
Comment 20 John Abraham 2013-08-15 20:10:44 EDT
(In reply to comment #19)
> Hi John,
> 
> don't backout. I'll handle this. It's just a matter of some documentation and a
> helper.
> 
> Cheers,
> 
> T.

Hi Tommaso,
In that case, can you clarify specifically what you mean by (can you state the code, now that it is already checked-in)

However I really don't like too much
> having two different things in different places (w.r.t. IPv4 and IPv6).
> Moreover if we keep it like this for 3.18, it will be harder to change it in
> the future (it would become an API change).



 From what I see it is similar in the internet stack helper
    1.15 +      Ptr<Ipv4> ipv4 = node->GetObject<Ipv4> ();
    1.16 +      if (ipv4 != 0)
    1.17 +        {
    1.18 +          Ptr<ArpL3Protocol> arpL3Protocol = ipv4->GetObject<ArpL3Protocol> ();
    1.19 +          if (arpL3Protocol != 0)
    1.20 +            {
    1.21 +              currentStream += arpL3Protocol->AssignStreams (currentStream);
    1.22 +            }
    1.23 +        }
    1.24 +      Ptr<Ipv6> ipv6 = node->GetObject<Ipv6> ();
    1.25 +      if (ipv6 != 0)
    1.26 +        {
    1.27 +          Ptr<Icmpv6L4Protocol> icmpv6L4Protocol = ipv6->GetObject<Icmpv6L4Protocol> ();
    1.28 +          if (icmpv6L4Protocol != 0)
    1.29 +            {
    1.30 +              currentStream += icmpv6L4Protocol->AssignStreams (currentStream);
    1.31 +            }
    1.32 +        }


Besides, the similarities between ipv6 and ipv4 are blurred when it comes to neighbor discovery. 

The protection I am trying to add is specifically for broadcasts and multicasts.


2. To restore the old functionality we could do

Config::SetDefault ("ns3::Icmpv6L4Protocol::SolicitationJitter", StringValue ("ns3::UniformRandomVariable[Min=0.0|Max=0.0]"));
  Config::SetDefault ("ns3::ArpL3Protocol::RequestJitter", StringValue ("ns3::UniformRandomVariable[Min=0.0|Max=0.0]"));


3. I will certainly look at the documentation, once your modifications are in.

4. We could also turn off ipv6 solicitation jitter, until ipv6 and ipv4 in general are looking similar in all areas
Comment 21 Tommaso Pecorella 2013-08-15 20:26:48 EDT
Hi John,

don't worry, IPv4 and IPv6 will never look the same (thanksfully). I meant to have the attributes in Ipv4 and Ipv6 classes. However you're probably right, it's not worth the effort.
The user should know the protocols enough to figure out why the attributes are there. 

Anyway, I did a quick patch to add what I think it's necessary. I checked for IPv6 and it seems to do the job. Feel free to apply it and close the bug.

Cheers,

T.
Comment 22 Tommaso Pecorella 2013-08-15 20:27:24 EDT
Created attachment 1662 [details]
Docs and helper functions
Comment 23 John Abraham 2013-08-16 01:22:13 EDT
Thanks I pushed it ,


author	John Abraham <john.abraham.in@gmail.com>
Thu, 15 Aug 2013 09:50:34 -0700 (12 hours ago)
changeset 10151	5d5dd1b12bfc



author	John Abraham <john.abraham.in@gmail.com>
Thu, 15 Aug 2013 21:00:57 -0700 (80 minutes ago)
changeset 10160	5bf2fe645784


(In reply to comment #22)
> Created attachment 1662 [details]
> Docs and helper functions