Bug 400

Summary: Wifi Muti-Hop Network not forwaring packets (atleast ARP)
Product: ns-3 Reporter: Luis <cortes>
Component: routingAssignee: Tom Henderson <tomh>
Status: RESOLVED FIXED    
Severity: normal CC: craigdo, gjcarneiro, mathieu.lacage, tomh
Priority: P1 Keywords: bug, feature-request
Version: ns-3-dev   
Hardware: PC   
OS: Linux   
Bug Depends on: 187    
Bug Blocks:    
Attachments: Test script were ns3 fails
patch against ns-3-dev to allow /32 addresses on ad hoc interfaces
patch against test case
patch against ns-3-dev to allow /32 addresses on ad hoc interfaces

Description Luis 2008-11-04 19:09:43 EST
Created attachment 289 [details]
Test script were ns3 fails

Hi,

I am using ns3.2.1.RC2 and I cant setup a multi-hop network as follows:

TOPOLOGY:

     WIRED CSMA             WIRELESS CHANNEL
   ______________
   O            O  ))          (( O ))           (( O
   N3           N0                N1                N2  

Here N1 hears packets from N2 and N0 but N2 can not communicate directly with N0. However, when I run it, N2 sends his ARP but N1 does not forward it to N0 and therefore it is not able to communicate. I am using Global Routing and running OLSR on the wireless network. 

P.S. Is there anyway to have a GOD mode ARP in which everybody knows everybody else's MAC address? If not, can you guys add this feature? Its probably easy to do but will take me longer to do than you guys.

Thanks!
Comment 1 Tom Henderson 2008-11-05 01:20:47 EST
(In reply to comment #0)
> Created an attachment (id=289) [details]
> Test script were ns3 fails
> 
> Hi,
> 
> I am using ns3.2.1.RC2 and I cant setup a multi-hop network as follows:
> 
> TOPOLOGY:
> 
>      WIRED CSMA             WIRELESS CHANNEL
>    ______________
>    O            O  ))          (( O ))           (( O
>    N3           N0                N1                N2  
> 
> Here N1 hears packets from N2 and N0 but N2 can not communicate directly with
> N0. However, when I run it, N2 sends his ARP but N1 does not forward it to N0
> and therefore it is not able to communicate. I am using Global Routing and
> running OLSR on the wireless network. 

Yes, this will not work.  Proxy ARP is not supported, there is one subnet for the entire adhoc network, and N2 assumes that everyone on the subnet is reachable via one hop (and therefore ARPs for it).

If you configure a real network like this, you would run into the same problem, due to the assumption that link and subnet are synonymous, but they are not in this case.

> 
> P.S. Is there anyway to have a GOD mode ARP in which everybody knows everybody
> else's MAC address? If not, can you guys add this feature? Its probably easy to
> do but will take me longer to do than you guys.

But, you do not want N0's MAC address here because N2 cannot reach N0 anyway.

One solution would be to put /32 addresses on each wireless interface, but I haven't tested that.  That, if OLSR could handle it, would eliminate the ARPing for N0.  It would require some low-level configuration (probably not through the helpers).

I think the answer is to work on a routing solution to this-- I will try to look at it more tomorrow.
Comment 2 Mathieu Lacage 2008-11-05 02:19:58 EST
see bug 187 for 'god' arp.
Comment 3 Luis 2008-11-05 09:58:24 EST
(In reply to comment #1)
> (In reply to comment #0)
> > Created an attachment (id=289) [details] [details]
> > Test script were ns3 fails
> > 
> > Hi,
> > 
> > I am using ns3.2.1.RC2 and I cant setup a multi-hop network as follows:
> > 
> > TOPOLOGY:
> > 
> >      WIRED CSMA             WIRELESS CHANNEL
> >    ______________
> >    O            O  ))          (( O ))           (( O
> >    N3           N0                N1                N2  
> > 
> > Here N1 hears packets from N2 and N0 but N2 can not communicate directly with
> > N0. However, when I run it, N2 sends his ARP but N1 does not forward it to N0
> > and therefore it is not able to communicate. I am using Global Routing and
> > running OLSR on the wireless network. 
> 
> Yes, this will not work.  Proxy ARP is not supported, there is one subnet for
> the entire adhoc network, and N2 assumes that everyone on the subnet is
> reachable via one hop (and therefore ARPs for it).
> 
> If you configure a real network like this, you would run into the same problem,
> due to the assumption that link and subnet are synonymous, but they are not in
> this case.
> 

Really? I implemented a wireless mesh network before using olsrd (http://www.olsr.org/) and B.A.T.M.A.N. (https://www.open-mesh.net/batman) routing protocols. It was composed of a few laptops running on Adhoc mode, on ubuntu OS. There was one gateway (N0) to the internet (N3), two wireless routers (adhoc nodes) (N1) and an Access Point (N2). I do remember seeing an ARP originating from the AP (N2). I do not remember, however, what happened at N1, how was the packet forwarded to N0. 



> > 
> > P.S. Is there anyway to have a GOD mode ARP in which everybody knows everybody
> > else's MAC address? If not, can you guys add this feature? Its probably easy to
> > do but will take me longer to do than you guys.
> 
> But, you do not want N0's MAC address here because N2 cannot reach N0 anyway.
> 
> One solution would be to put /32 addresses on each wireless interface, but I
> haven't tested that.  That, if OLSR could handle it, would eliminate the ARPing
> for N0.  It would require some low-level configuration (probably not through
> the helpers).

I think this is what happened in my Wireless Mesh Network implementation. So each Node had its next hop be its default gateway to the internet. Here is my global routing table for 3 hops:

FROM\TO		10.0.0.1, 	10.0.0.2,	10.0.0.6,	10.0.0.100
10.0.0.1;	0, 		10.0.0.2,	10.0.0.2,	10.0.0.2
10.0.0.2;	10.0.0.1,	10.0.0.1,	10.0.0.6,	10.0.0.6
10.0.0.6;	10.0.0.2,	10.0.0.2,	10.0.0.2,	10.0.0.100
10.0.0.100;	10.0.0.6,	10.0.0.6,	10.0.0.6,	10.0.0.6


The first column shows FROM which IP, the first row shows TO which IP. The IP listed under a FROM to a TO is the Next hop, and the IP listed under a FROM to the same TO is the Default Gateway. So from 10.0.0.2 to 10.0.0.100, the next hop is 10.0.0.6. And the default gateway for 10.0.0.2 is 10.0.0.1 (looking at From 10.0.0.2 To 10.0.0.2).

The topology was:

Internet             WIRELESS CHANNEL
_______
      O  ))          (( O ))           (( O ))           (( O ))
      N1                N2                N6               N100

where N1 the gateway, N2 and N6 wireless nodes, and N100 was the AP. Is there no default gateway implementation on NS3? I'm going to try to setup the IP's Manually.

> 
> I think the answer is to work on a routing solution to this-- I will try to
> look at it more tomorrow.
> 

I think Multi-hop wireless networks are important to the research community and its very important that NS3 has such capability. 

Thanks,
Comment 4 Luis 2008-11-05 18:34:02 EST
I think my explanation was too complicated. So here is what I think it should do.

     WIRED CSMA             WIRELESS CHANNEL
   ______________
   O            O  ))          (( O ))           (( O
   N3           N0                N1                N2  

When N2 is going to Tx to N3, it sends his packets to the next hop which is N1, with final destination N3.N2 does this by broadcasting an ARP for N1. N1 replies, and gets the packet, and knows that to get to N3, it needs to use the next hop N0. Again ARP is sent from N1. N0 replies, gets the packet and knows it has to send it to N3 through interface 0. It broadcasts an APR and N3 responds. 

Is this correct? that is the only way I see this works on real networks, since ARP is broadcast. 

Mathieu, setting the nodes up with different /32 address makes the global routing crash. I changed

  Ipv4InterfaceContainer wifiInterfaces;
  wifiInterfaces = address.Assign (wifiDevices);
  
To

  address.Assign (wifiDevices.Get(0));
  address.NewNetwork();
  address.Assign (wifiDevices.Get(1));
  address.NewNetwork();
  address.Assign (wifiDevices.Get(2));

and Global routing crashes...


Comment 5 Tom Henderson 2008-11-06 00:34:55 EST
(In reply to comment #4)
> I think my explanation was too complicated. So here is what I think it should
> do.
> 
>      WIRED CSMA             WIRELESS CHANNEL
>    ______________
>    O            O  ))          (( O ))           (( O
>    N3           N0                N1                N2  
> 
> When N2 is going to Tx to N3, it sends his packets to the next hop which is N1,
> with final destination N3.N2 does this by broadcasting an ARP for N1. N1
> replies, and gets the packet, and knows that to get to N3, it needs to use the
> next hop N0. Again ARP is sent from N1. N0 replies, gets the packet and knows
> it has to send it to N3 through interface 0. It broadcasts an APR and N3
> responds. 
> 
> Is this correct? that is the only way I see this works on real networks, since
> ARP is broadcast. 

I agree.

> 
> Mathieu, setting the nodes up with different /32 address makes the global
> routing crash. I changed
> 
>   Ipv4InterfaceContainer wifiInterfaces;
>   wifiInterfaces = address.Assign (wifiDevices);
> 
> To
> 
>   address.Assign (wifiDevices.Get(0));
>   address.NewNetwork();
>   address.Assign (wifiDevices.Get(1));
>   address.NewNetwork();
>   address.Assign (wifiDevices.Get(2));
> 
> and Global routing crashes...
> 

I will try to debug the global routing problem tomorrow.
Comment 6 Tom Henderson 2008-11-06 09:45:40 EST
(In reply to comment #4)

> and Global routing crashes...
> 

I would recommend to avoid using global routing in this script, and just set a default route to node n0 from node n3, and use Olsr on the ad hoc network.  See Ipv4::SetDefaultRoute().

Olsr still has the problem though of using a multi-link subnet when addresses from the same subnet are assigned, which leads to the ARP problems you saw earlier.  When global routing is removed, one can then do the following:

-  address.SetBase("192.168.0.0","255.255.255.0");
+  address.SetBase("192.168.0.0","255.255.255.255");
  Ipv4InterfaceContainer wifiInterfaces;
  wifiInterfaces = address.Assign (wifiDevices);

and the wifi interfaces will get /32 addresses.  However, although the problematic ARPs disappear, OLSR does not seem to like it (can't find a route), so that needs to be debugged next.
Comment 7 Luis 2008-11-06 14:00:34 EST
(In reply to comment #6)
> (In reply to comment #4)
> 
> > and Global routing crashes...
> > 
> 
> I would recommend to avoid using global routing in this script, and just set a
> default route to node n0 from node n3, and use Olsr on the ad hoc network.  See
> Ipv4::SetDefaultRoute().
> 
> Olsr still has the problem though of using a multi-link subnet when addresses
> from the same subnet are assigned, which leads to the ARP problems you saw
> earlier.  When global routing is removed, one can then do the following:
> 
> -  address.SetBase("192.168.0.0","255.255.255.0");
> +  address.SetBase("192.168.0.0","255.255.255.255");
>   Ipv4InterfaceContainer wifiInterfaces;
>   wifiInterfaces = address.Assign (wifiDevices);

Thanks for the suggestion, but I get the following:

Compilation finished successfully 
Ipv4AddressHelper::SetBase(): Inconsistent address and mask
Command ['/cluster/maniacs/cortes/ns3/ns-3.2.1-RC2/build/debug/scratch/wifi-mesh'] exited with code -11 

Setting up the static routes took me a little bit since I'm not used to the NS3 style. Is the following correct? Is there a better way?

  Ptr<Ipv4> ipv4 = meshNodes.Get(2)->GetObject<Ipv4> ();
  ipv4->SetDefaultRoute(wifiInterfaces.GetAddress (1),0);
  ipv4 = meshNodes.Get(1)->GetObject<Ipv4> ();
  ipv4->SetDefaultRoute(wifiInterfaces.GetAddress (0),0);

Thanks
> 
> and the wifi interfaces will get /32 addresses.  However, although the
> problematic ARPs disappear, OLSR does not seem to like it (can't find a route),
> so that needs to be debugged next.
> 

Comment 8 Gustavo J. A. M. Carneiro 2008-11-08 06:53:36 EST
N3 does not have OLSR running, therefore N2 will never be able to contact it, even if /32 addresses are used in OLSR.  And /32 addresses "should" work in OLSR, and is the recommended way of using OLSR, actually.  I tested this in the past and it used to work, at least in my "grid" scenario where I tested a few hundred nodes...
Comment 9 Gustavo J. A. M. Carneiro 2008-11-08 13:01:54 EST
OK, corrections on previous comments.  Actually I made a mistake and what my working wireless OLSR code used to do was:

  1. Use /16 addresses for the nodes;
  2. Make sure the nodes do not have a "network route" for the network they are attached to.

So, basically, suppose you have two nodes, N1, N2.

N1:
   Interface: 10.0.0.1/16
   static routing table: emtpy

N2:
   Interface: 10.0.0.2/16
   static routing table: emtpy

In this case, when N2 wants to contact 10.0.0.1 it asks Ipv4 routing.  Ipv4 routing iterates over its list of "routing protocols" (see class Ipv4RoutingProtocol in src/node/ipv4.h).  The first "routing protocol" is always the static routing (not really a protocol, just a routing table).  In this case static routing cannot find a route for 10.0.0.1.  Next ipv4 asks the second routing protocol, which is OLSR, and OLSR should be able to return a route for 10.0.0.1.

This is why it is crucial that the network interface does not have static routing entries that match any of the other nodes.  One way to accomplish this is to use /16 (or /whatver < 32) interface addresses but making sure the /16 network route is removed, another option would be to use /32 interface addresses.  But the method I have previously tested was with /16 addresses.
Comment 10 Tom Henderson 2008-11-08 14:51:34 EST
(In reply to comment #8)
> N3 does not have OLSR running, therefore N2 will never be able to contact it,
> even if /32 addresses are used in OLSR.  And /32 addresses "should" work in
> OLSR, and is the recommended way of using OLSR, actually. 

I couldn't find any reference to /32 being the recommended way to run OLSR, and someone told me that the IETF went out of its way to avoid such a recommendation.  I will ask some people at the upcoming IETF about this.

I believe that either /32 or /24 should work in a real implementation.  If there is not a /32, then OLSR will still populate a host route (next hop) that overrides the default destination "on-link" network route, and the system will ARP for that next hop instead.

Putting a /24 (or other subnet mask) on an ad-hoc 802.11 interface will often lead to a "multi-link subnet":  http://ietfreport.isoc.org/rfc/rfc4903.txt
This may or may not be problematic, depending on how the system is used.  For instance, sending a subnet-directed broadcast with TTL 1 on a multilink subnet may not reach all nodes, depending on the adhoc routing being used.
Comment 11 Tom Henderson 2008-11-08 15:29:42 EST
(In reply to comment #9)
> OK, corrections on previous comments.  Actually I made a mistake and what my
> working wireless OLSR code used to do was:
> 
>   1. Use /16 addresses for the nodes;
>   2. Make sure the nodes do not have a "network route" for the network they are
> attached to.
> 
> So, basically, suppose you have two nodes, N1, N2.
> 
> N1:
>    Interface: 10.0.0.1/16
>    static routing table: emtpy
> 
> N2:
>    Interface: 10.0.0.2/16
>    static routing table: emtpy
> 
> In this case, when N2 wants to contact 10.0.0.1 it asks Ipv4 routing.  Ipv4
> routing iterates over its list of "routing protocols" (see class
> Ipv4RoutingProtocol in src/node/ipv4.h).  The first "routing protocol" is
> always the static routing (not really a protocol, just a routing table).  In
> this case static routing cannot find a route for 10.0.0.1.  Next ipv4 asks the
> second routing protocol, which is OLSR, and OLSR should be able to return a
> route for 10.0.0.1.

Actually, since OLSR is being entered at priority 10, it is being consulted before static routing at priority 0.

  // Add OLSR as routing protocol, with slightly higher priority than
  // static routing.
  m_ipv4->AddRoutingProtocol (m_routingTable, 10);

Comment 12 Tom Henderson 2008-11-08 15:34:46 EST
Created attachment 295 [details]
patch against ns-3-dev to allow /32 addresses on ad hoc interfaces

This patch is against ns-3-dev and is a collection of changesets that enable correct behavior when a /32 address (i.e., mask of 255.255.255.255) is assigned to an Ipv4Interface.

It works for ns-3.2.1-RC3 as well (although patch does not apply cleanly in all cases and some .rej must be hand merged)
Comment 13 Tom Henderson 2008-11-08 15:36:31 EST
Created attachment 296 [details]
patch against test case

This patch of the original test script, combined with the simulator patch, will allow the test case to run.

- change interface addresses in adhoc network to /32
- remove global routing; use OLSR on all nodes
- allow OLSR routing to converge before starting application traffic (namely, start application traffic at time 10 seconds)
Comment 14 Gustavo J. A. M. Carneiro 2008-11-08 16:10:51 EST
(In reply to comment #11)
> Actually, since OLSR is being entered at priority 10, it is being consulted
> before static routing at priority 0.
> 
>   // Add OLSR as routing protocol, with slightly higher priority than
>   // static routing.
>   m_ipv4->AddRoutingProtocol (m_routingTable, 10);
> 

Ah, right, I forgot that higher numbers mean higher priorities here.

Anyway, in that case it may actually work with /16 addresses.  However, it could be confusing because if applications start sending traffic before the routes are discovered then OLSR will be asked about a route but not find one, then static routing table will tell IPv4 that N3 is in the local network, then ARP kicks in, erroneously, leading to bug reports such as this.

But, to be honest I don't have any formed opinion on how we could make OLSR more intuitive to use.  Even I can get confused with it, as show in this bug report :-/
Comment 15 Tom Henderson 2008-11-10 14:20:04 EST
(In reply to comment #14)
> (In reply to comment #11)
> > Actually, since OLSR is being entered at priority 10, it is being consulted
> > before static routing at priority 0.
> > 
> >   // Add OLSR as routing protocol, with slightly higher priority than
> >   // static routing.
> >   m_ipv4->AddRoutingProtocol (m_routingTable, 10);
> > 
> 
> Ah, right, I forgot that higher numbers mean higher priorities here.
> 
> Anyway, in that case it may actually work with /16 addresses.  However, it
> could be confusing because if applications start sending traffic before the
> routes are discovered then OLSR will be asked about a route but not find one,
> then static routing table will tell IPv4 that N3 is in the local network, then
> ARP kicks in, erroneously, leading to bug reports such as this.

Yes, I tested the code with my patch above, and it also works with subnets now too.  But the ARP behavior you describe will occur if data starts to flow before routing has a chance to converge.

> 
> But, to be honest I don't have any formed opinion on how we could make OLSR
> more intuitive to use.  Even I can get confused with it, as show in this bug
> report :-/
> 

I was thinking that some kind of trace hook for routing table change notifications, and some helper function to connect these to an output file, would be very helpful.  I will look into this.
Comment 16 Tom Henderson 2008-11-29 18:55:45 EST
Created attachment 316 [details]
patch against ns-3-dev to allow /32 addresses on ad hoc interfaces

revised patch, ready to apply, I believe.  Main change is more doxygen in ipv4-address.h.

I did not add a "Ipv4Mask::IsOnes()" method since there are not corresponding "Is..()" methods for other things.
Comment 17 Tom Henderson 2008-12-02 00:40:31 EST
Fixed in changeset 0339a8ad5983