Bug 740 - OLSR MprCompute () works wrong
: OLSR MprCompute () works wrong
Status: RESOLVED FIXED
: ns-3
routing
: ns-3-dev
: All All
: P5 normal
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-11-12 11:10 EDT by
Modified: 2009-11-19 10:19 EDT (History)


Attachments
Test-case (7.66 KB, patch)
2009-11-12 11:10 EDT, Kirill Andreev
Details | Diff
Propsed fix (1.55 KB, patch)
2009-11-12 11:11 EDT, Kirill Andreev
Details | Diff
Test-case (7.66 KB, patch)
2009-11-12 11:14 EDT, Kirill Andreev
Details | Diff
Fix #2 (3.37 KB, patch)
2009-11-17 07:50 EDT, Pavel Boyko
Details | Diff
Fix #2, to be applied above first fix. (3.88 KB, patch)
2009-11-17 07:52 EDT, Pavel Boyko
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-11-12 11:10:36 EDT
Created an attachment (id=650) [details]
Test-case

A simple tes-tcase (3x3 grid) was conducted and MPRs are selelcted wrong.
Test-case is attached and proposal fix is attached
------- Comment #1 From 2009-11-12 11:11:02 EDT -------
Created an attachment (id=651) [details]
Propsed fix
------- Comment #2 From 2009-11-12 11:14:05 EDT -------
Created an attachment (id=652) [details]
Test-case
------- Comment #3 From 2009-11-12 16:20:26 EDT -------
(In reply to comment #2)
> Created an attachment (id=652) [details] [details]
> Test-case

You seemingly read my mind-- I wanted to create a similar test case and rename
the existing one to OLSR header (as you did), based on the previous bug 733
that was fixed this week.  The only different that I had in mind would be to
instead export a trace source for "MprChanges" that exported the MPR set every
time it changed, and hook that.  But the existing GetMprSet () works fine too.

+1
------- Comment #4 From 2009-11-13 09:32:58 EDT -------
Hi, I am the (mostly-absent) maintainer of the OLSR code, but I see the patch
was committed without me having time to even look at the issue, much less
approve.  While I love that a unit test was added, I have serious doubts about
the patch:

     2.1 --- a/src/routing/olsr/olsr-routing-protocol.cc    Thu Nov 12 14:08:51
2009 +0100
     2.2 +++ b/src/routing/olsr/olsr-routing-protocol.cc    Fri Nov 13 11:47:02
2009 +0300
     2.3 @@ -716,6 +716,10 @@
     2.4                if (max == NULL || nb_tuple->willingness >
max->willingness)
     2.5                  {
     2.6                    max = nb_tuple;
     2.7 +                  for (TwoHopNeighborSet::iterator newCovered =
N2.begin (); newCovered != N2.end (); newCovered++)
     2.8 +                    {
     2.9 +                      coveredTwoHopNeighbors.insert
(newCovered->twoHopNeighborAddr);
    2.10 +                    }
    2.11                    max_r = r;
    2.12                  }
    2.13                else if (nb_tuple->willingness == max->willingness)

We are inside a typical for loop that searches for a "maximum" on a set of
values.  Your change assumes that, each time a new maximum is found it will be
selected, and immediately updates the coveredTwoHopNeighbors with the
assumption that the candidate will be selected.  However, as the search for
maximum continues, a new maximum may be found...  This leads to potentially
prematurely considering some 2-hop neighbors already covered when in fact they
are not.

    2.14 @@ -740,11 +744,12 @@
    2.15        if (max != NULL)
    2.16          {
    2.17            mprSet.insert (max->neighborMainAddr);
    2.18 -          for (TwoHopNeighborSet::iterator twoHopNeigh = N2.begin ();
    2.19 -               twoHopNeigh != N2.end (); )
    2.20 +          // Remove the nodes from N2 which are now covered by a node
in the MPR set.
    2.21 +          for (TwoHopNeighborSet::iterator twoHopNeigh = N2.begin ();
twoHopNeigh != N2.end (); )
    2.22              {
    2.23 -              if (twoHopNeigh->neighborMainAddr ==
max->neighborMainAddr)
    2.24 +              if (coveredTwoHopNeighbors.find
(twoHopNeigh->twoHopNeighborAddr) != coveredTwoHopNeighbors.end ())
    2.25                  {
    2.26 +                  NS_LOG_LOGIC ("2-hop neigh. " <<
twoHopNeigh->twoHopNeighborAddr << " is already covered by an MPR.");
    2.27                    twoHopNeigh = N2.erase (twoHopNeigh);
    2.28                  }
    2.29                else

This code makes use of coveredTwoHopNeighbors.  In the previous version only N2
was updated and consulted in this last part of the code (step 4 of the
algorythm).  It was assumed that a two-hop neighbor is covered if it is not
listed in N2 anymore, so there really was no need to update both
coveredTwoHopNeighbors and N2 at the same time, right?

It was never explained in this bug report what MPR set was being selected and
which one should in theory have been selected.  Keep in mind that OLSR contains
an heuristic for find a _good_ MPR set; it is not guaranteed to find the _best_
MPR set.  Find the best set is an NP-complete problem...
------- Comment #5 From 2009-11-17 07:49:38 EDT -------
  Hi, Gustavo,

  I will support this bug while Kirill is on the vacation. 

(In reply to comment #4)
> Hi, I am the (mostly-absent) maintainer of the OLSR code, but I see the patch
> was committed without me having time to even look at the issue, much less
> approve.  

  I'm terribly sorry, we will never do it again.

> While I love that a unit test was added, I have serious doubts about
> the patch:

  I agree with you that proposed (and commited) "fix" is completely wrong. 

  As I can see, the real error was in the lines:

          for (TwoHopNeighborSet::iterator twoHopNeigh = N2.begin ();
twoHopNeigh != N2.end (); )
            {
              if (twoHopNeigh->neighborMainAddr == neighbor->neighborMainAddr)
                {
                  NS_LOG_LOGIC ("2-hop neigh. " <<
twoHopNeigh->twoHopNeighborAddr << " is already covered by an MPR.");
                  twoHopNeigh = N2.erase (twoHopNeigh);
                }
              else
                {
                  twoHopNeigh++;
                }
            }

  you were used to remove covered 2-hop neighbors from N2 set. This code
ignores the fact that N2 set is made of pairs {neighbor, two hop neighbor} and
that the same 2-hop neighbor can appear multiple times in N2 with different
1-hop neighbors. All this records must be deleted when two hop neighbor is
covered by MPR. 

  The simplest topology which illustrates this is 

  O -- A
  |    |
  B -- C  

  where O selects MPR set and N = {A, B} N2 = {{A, C}, {B, C}}. When A is
selected as MPR your code will remove tuple {A, C} from N2 but will not remove
record {B, C} and this will case B to be erroneously selected as the second
MPR.

  The proposed fix-to-fix is attached (It should be applied to ns-3-dev as is),
please take a look.

> It was never explained in this bug report what MPR set was being selected and
> which one should in theory have been selected. 

  I agree that current unit test topology is overcomplicated. If you don't
object I will simplify existing unit test to reproduce minimal topology drawn
above and add some more MPR selection tests. You can do this by yourself if you
like to.  

>  Keep in mind that OLSR contains
> an heuristic for find a _good_ MPR set; it is not guaranteed to find the _best_
> MPR set.  Find the best set is an NP-complete problem...

  I know. Now we are talking about correct implementation of recommended
heuristic from RFC 3626 8.3.1.
------- Comment #6 From 2009-11-17 07:50:01 EDT -------
Created an attachment (id=655) [details]
Fix #2
------- Comment #7 From 2009-11-17 07:52:33 EDT -------
Created an attachment (id=656) [details]
Fix #2, to be applied above first fix.
------- Comment #8 From 2009-11-18 05:36:51 EDT -------
*sigh* I am stuck spending my time fixing Python bindings, so it might take a
while for me to get to this.
------- Comment #9 From 2009-11-18 05:41:05 EDT -------
(In reply to comment #8)
> *sigh* I am stuck spending my time fixing Python bindings, so it might take a
> while for me to get to this.

  No problem. Meanwhile, can I commit new OLSR unit & system tests (not fixes
just tests :) without your explicit permission?
------- Comment #10 From 2009-11-18 05:49:40 EDT -------
(In reply to comment #9)
> (In reply to comment #8)
> > *sigh* I am stuck spending my time fixing Python bindings, so it might take a
> > while for me to get to this.
> 
>   No problem. Meanwhile, can I commit new OLSR unit & system tests (not fixes
> just tests :) without your explicit permission?

Sure.  Just please leave the MprCompuation code intact for a while until I have
time to review it.  It is a very complicated piece of code, not easy to review
it... :-/
------- Comment #11 From 2009-11-19 06:39:46 EDT -------
(From update of attachment 656 [details])
Cristal clear patch.  Please commit.