Bug 780

Summary: OLSR doesn't converge
Product: ns-3 Reporter: Pavel Boyko <boyko>
Component: routingAssignee: ns-bugs <ns-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: gjcarneiro, kevjay, tomh
Priority: P5    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: How to reproduce
patch to enable more debug logging for OLSR
patch to enable more debug logging for OLSR (modified)
bug fix

Description Pavel Boyko 2009-12-19 11:01:49 EST
Created attachment 702 [details]
How to reproduce

OLSR is observed to not converge in simple 3-nodes varying topology. Description by Abdul Jabbar (a little bit modified):

We place 3 nodes in a linear fashion.

 1 ----------------- 2 ----------------- 3
(0,0)         (250,0)             (500,0)
     d=250 m         d=250m

After an initial settling time of 50 seconds (just to be safe), we start
both the traffic and mobility (constant velocity mobility model via
SetVelocity method).

The transmission range is set to 300-350 meters and we send ping traffic from
node 1 to node 2 staring from t = 50 seconds. We then move nodes 2 and 3 such that they end up interchanging their positions...

First we bring node 3 very close to node 2. In fact, we bring 2 to the same
node coordinates as 3.

 1 ----------------- 2 -3

And then move the node 2 away until it reaches the original position of node
3.

 1 ----------------- 3 ----------------- 2
(0,0)            (250,0)         (500,0)

At this point, we set the velocity of all the nodes to zero, till the end of
the simulation. We let the simulation run for couple hundred seconds after
this point. And OLSR still doesn't converge on the routes! Note that
throughout the simulation, there are no network partitions. There is always
a path between every node pair.

Test program is attached. Current PING summary:
--- 10.1.1.2 ping statistics ---
150 packets transmitted, 69 received, 54% packet loss, time 149999ms
rtt min/avg/max/mdev = 2/2.058/6/0.4815 ms

Expected result: 0% packet loss. Note, that the expected result is achieved when OLSR is changed to AODV.
Comment 1 Tom Henderson 2009-12-20 18:05:31 EST
Created attachment 703 [details]
patch to enable more debug logging for OLSR

Here is a start of a patch to improve OLSR debugging capability, which adds a public method RoutingProtocol::Dump() which dumps to the logging output at log level DEBUG.

This method can be scheduled in the test program by including
#include "ns3/olsr-routing-protocol.h"

and by adding statements such as:
  Ptr<olsr::RoutingProtocol> rp0 = adhocNodes.Get (0)->GetObject <olsr::RoutingProtocol> ();
  NS_ASSERT (rp0);
  Ptr<olsr::RoutingProtocol> rp1 = adhocNodes.Get (1)->GetObject <olsr::RoutingProtocol> ();
  NS_ASSERT (rp1);
  Ptr<olsr::RoutingProtocol> rp2 = adhocNodes.Get (2)->GetObject <olsr::RoutingProtocol> ();
  NS_ASSERT (rp2);
  Simulator::Schedule (Seconds (100.0), &olsr::RoutingProtocol::Dump, rp0);
  Simulator::Schedule (Seconds (100.0), &olsr::RoutingProtocol::Dump, rp1);
  Simulator::Schedule (Seconds (100.0), &olsr::RoutingProtocol::Dump, rp2);

then try:
NS_LOG="OlsrRoutingProtocol" ./waf --run abdul
Comment 2 Tom Henderson 2009-12-20 18:10:08 EST
I haven't looked into the cause yet, but here are the symptoms.  The pings are not leaving the leftmost node (numbered 1 in the above ascii art) due to a route lookup failure.

Before the mobility occurs, this is what a dump on node 1 looks like:
100s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1
100s -1 [node 0] OlsrRoutingProtocol:Dump():  Neighbor set
100s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.2 status:1 willingness:3
100s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.3 status:1 willingness:3
100s -1 [node 0] OlsrRoutingProtocol:Dump():  Two-hop neighbor set
100s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.2->10.1.1.3
100s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.2->10.1.1.1
100s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.3->10.1.1.2
100s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.3->10.1.1.1
100s -1 [node 0] OlsrRoutingProtocol:Dump():  Routing table
100s -1 [node 0] OlsrRoutingProtocol:Dump():   dest=10.1.1.2 --> next=10.1.1.2 via interface 1
100s -1 [node 0] OlsrRoutingProtocol:Dump():   dest=10.1.1.3 --> next=10.1.1.3 via interface 1


After the nodes have switched places (at time 150), here is a repeat of the dump on node 1:
150s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Neighbor set
150s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.2 status:0 willingness:3
150s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.3 status:1 willingness:3
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Two-hop neighbor set
150s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.3->10.1.1.2
150s -1 [node 0] OlsrRoutingProtocol:Dump():   10.1.1.3->10.1.1.1
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Routing table
150s -1 [node 0] OlsrRoutingProtocol:Dump():   dest=10.1.1.3 --> next=10.1.1.3 via interface 1

The next place I would look at is why node 10.1.1.2 is not in the routing table at time 150 despite being in the two-hop neighbor set of node 10.1.1.3.
Comment 3 Gustavo J. A. M. Carneiro 2009-12-21 06:28:02 EST
Tom's patch does not take into account that 2-hop neighbor tuples may exist in an "expired" state.  Also modified to use the existing operator << for printing the tuples, so that all fields are printed.  The result at t=150 is:

150s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Neighbor set
150s -1 [node 0] OlsrRoutingProtocol:Dump():   NeighborTuple(neighborMainAddr=10.1.1.2, status=NOT_SYM, willingness=3)
150s -1 [node 0] OlsrRoutingProtocol:Dump():   NeighborTuple(neighborMainAddr=10.1.1.3, status=SYM, willingness=3)
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Two-hop neighbor set
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Routing table
150s -1 [node 0] OlsrRoutingProtocol:Dump():   dest=10.1.1.3 --> next=10.1.1.3 via interface 1
150s -1 [node 0] OlsrRoutingProtocol:Dump(): 

So, basically node 0 has _no_ 2-hop neighbors whatsoever.  Moreover, there's a symmetric link to node 10.1.1.3, therefore it is added to the routing table, but the link to node 10.1.1.2 is not symmetric, i.e. we can hear them but they can't hear us.  The problem is not in the routing table computation, it is elsewhere.  Link sensing is failing...
Comment 4 Gustavo J. A. M. Carneiro 2009-12-21 06:35:52 EST
Created attachment 704 [details]
patch to enable more debug logging for OLSR (modified)

Forget what I said before about 2-hop neighbors, I had a small bug in my modification :P   Here's the correct patch and what it dumps:

150s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Neighbor set
150s -1 [node 0] OlsrRoutingProtocol:Dump():   NeighborTuple(neighborMainAddr=10.1.1.2, status=NOT_SYM, willingness=3)
150s -1 [node 0] OlsrRoutingProtocol:Dump():   NeighborTuple(neighborMainAddr=10.1.1.3, status=SYM, willingness=3)
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Two-hop neighbor set
150s -1 [node 0] OlsrRoutingProtocol:Dump():   TwoHopNeighborTuple(neighborMainAddr=10.1.1.3, twoHopNeighborAddr=10.1.1.2, expirationTime=154478525839ns)
150s -1 [node 0] OlsrRoutingProtocol:Dump():   TwoHopNeighborTuple(neighborMainAddr=10.1.1.3, twoHopNeighborAddr=10.1.1.1, expirationTime=154478525839ns)
150s -1 [node 0] OlsrRoutingProtocol:Dump():  Routing table
150s -1 [node 0] OlsrRoutingProtocol:Dump():   dest=10.1.1.3 --> next=10.1.1.3 via interface 1
150s -1 [node 0] OlsrRoutingProtocol:Dump(): 


So, here's the picture that node 0 sees: 10.1.1.1 --- 10.1.1.3 --- 10.1.1.2
                                         10.1.1.1  ---> 10.1.1.2  (assymetric link)

Based on this, the routing table should indeed have also a "10.1.1.2 via 10.1.1.3" entry.  Bug in RoutingTableComputation.
Comment 5 Gustavo J. A. M. Carneiro 2009-12-21 06:57:24 EST
Created attachment 705 [details]
bug fix
Comment 6 Pavel Boyko 2009-12-21 08:34:00 EST
(In reply to comment #5)
> Created an attachment (id=705) [details]
> bug fix

 I confirm that patch fixes the problem. Gustavo, could you convert test case, attached to this bug to trace-based regression test in src/routing/olsr/test/ ?
Comment 7 Gustavo J. A. M. Carneiro 2009-12-21 11:10:43 EST
changeset:   5867:8c65da328ae9
tag:         tip
user:        Gustavo J. A. M. Carneiro  <gjc@inescporto.pt>
date:        Mon Dec 21 16:07:08 2009 +0000
summary:     Fix Bug 780 (problem in RoutingTableComputation with asymetric links), while adding debugging methods to OLSR.  Thanks Abdul Jabbar, Pavel Boyko, and Tom Henderson for test case and debug code.