Bug 1099

Summary: AODV performance problems
Product: ns-3 Reporter: Tom Henderson <tomh>
Component: aodvAssignee: John Abraham <john.abraham.in>
Status: RESOLVED FIXED    
Severity: normal CC: john.abraham.in, ns-bugs
Priority: P3    
Version: pre-release   
Hardware: All   
OS: All   
Attachments: aodv test program
aodv test output
olsr test program
OLSR test output
DSDV test program
DSDV test output
Aodv test output (2)
fix bug in generating RREQ
improve logging
example program for future test use
adjacency vs time
patch
comparison with latest patch
packet receive rate comparison btw olsr,aodv,dsdv
Patch

Description Tom Henderson 2011-04-15 19:10:51 EDT
It has been reported for some time now that AODV has performance problems when compared to OLSR (and now DSDV) in mobile networks.  This pops up periodically on the ns-3-users mailing list as well.

The attached programs are test programs and resulting test data that illustrate that AODV performance is not anywhere close to the other MANET protocols.  The second/third column of these csv plots show the reception of data at the packet sinks, after time 50.  DSDV achieves up to 45 packets/sec.  Olsr achieves around 40 packets/sec.  AODV, however, except for time 52, achieves only a few packets per second, and nothing after time 180.

Some existing bug reports for AODV have patches (1042--RERR and 1097-- UpdateRouteLifetime) but these are not sufficient to remedy the problem; there is something else going on.
Comment 1 Tom Henderson 2011-04-15 19:12:22 EDT
Created attachment 1067 [details]
aodv test program
Comment 2 Tom Henderson 2011-04-15 19:12:55 EDT
Created attachment 1068 [details]
aodv test output
Comment 3 Tom Henderson 2011-04-15 19:13:18 EDT
Created attachment 1069 [details]
olsr test program
Comment 4 Tom Henderson 2011-04-15 19:13:45 EDT
Created attachment 1070 [details]
OLSR test output
Comment 5 Tom Henderson 2011-04-15 19:14:03 EDT
Created attachment 1071 [details]
DSDV test program
Comment 6 Tom Henderson 2011-04-15 19:14:20 EDT
Created attachment 1072 [details]
DSDV test output
Comment 7 Tommaso Pecorella 2011-04-25 11:57:02 EDT
Created attachment 1098 [details]
Aodv test output (2)

The results are quite dodgy as well...

While OLSR results are consistent on my machine (i.e., identical output), AODV one is radically different.

On a side note, DSDV fires an error...
High precision 128 bits multiplication error: multiplication overflow.

I don't know if it's a core bug, a DSDV bug or a simulation bug. In doubt, I filled a report:
http://www.nsnam.org/bugzilla/show_bug.cgi?id=1116

T.
Comment 8 Tom Henderson 2011-05-12 10:02:47 EDT
I've been looking into this some more.  Tommaso's results show that it is not working anymore in the 50 node mobile WiFi scenario, and I confirmed this result.  In that mobile scenario, the nodes try to send 6000 datagrams over 150 seconds.  

These are the current results:
Packets sent:  ~6000
Packets received (OLSR from bug 999 tests): 5436
Packets received (DSDV from bug 999 tests): 4195
Packets received (AODV: current ns-3-dev):  2
Packets received (AODV; ns-3-dev with patch for bug 1042):  240
Packets received (AODV: ns-3-dev with patches for both 1042 and 1097):  315

I would like to apply the patches to 1042 and 1097 and keep working on the remaining issues.
Comment 9 Tom Henderson 2011-05-20 18:48:08 EDT
Created attachment 1146 [details]
fix bug in generating RREQ
Comment 10 Tom Henderson 2011-05-20 18:48:41 EDT
Created attachment 1147 [details]
improve logging
Comment 11 Tom Henderson 2011-05-20 18:49:23 EDT
Created attachment 1148 [details]
example program for future test use

This replaces the previous separate programs for aodv, olsr, and dsdv
Comment 12 Tom Henderson 2011-05-20 19:18:11 EDT
I've been looking at this for a few days now.  I fixed one additional bug (patch attached); the route was not being set to IN_SEARCH when an RREQ is generated.

The test program attached generates a MANET in a rectangular region.  By default, 50 nodes with 10 source/sink receiver pairs.  Transmit power, mobility, etc. can be varied.   This is based on the original one submitted by Justin Rohrer.

As transmit power is varied (default 7.5 dBm) the network gets effectively denser as power increases, and sparser as power decreases.  At a very high power level (e.g. 50 dBm) everyone is always within one hop of one another.

The problem with AODV is observed with the program defaults, 50 nodes, and 10 source/sink pairs.  I tried to reduce the problem but found that at smaller networks and under different conditions, AODV did pretty well.  Here are some examples for a 10 node network with 5 source/sink pairs; the numbers are the fraction of successful user data receptions.  The RngRun was set to 5.  OLSR was also run as a comparison.

Transmit power (dBm)     AODV ns-3.10      AODV with bug fixes     OLSR
50                            1                   1                 1
10                            0.45                0.9               0.88
7.5                           0.44                0.7               0.81
5                             0.35                0.64              0.54
2.5                           0.28                0.45              0.32
0                             0.27                0.30              0.21

However, at 50 nodes, 10 source/sink pairs, we see (only two values tested today; the data takes longer to generate):

Transmit power (dBm)         AODV with bug fixes     OLSR
10                             0.18                   0.96
0                              0.04                   0.47

It is difficult to untangle what is going on with AODV in this denser network with more traffic.  However, there is a lot of AODV traffic in the network (between 1-2 Mb/s measured at one node).  The network seems to bog down in route discovery and responses as time goes on.

The problem gets worse as more source/sink pairs are added, since AODV is data-driven.  For instance, reducing to only one source/sink pair raises the delivery ratio to 0.74 from 0.18 at transmit power 10 dBm (disclaimer, this is a single data point and not systematically studied).

AODV is known to have problems in dense networks:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4351754&tag=1

This ns-3 implementation does not implement expanding ring search.  The protocol defaults are set to what is in RFC 3561, with some fairly unreasonable values for defaults (NET_DIAMETER = 35).  The corresponding timers are long.  So, the situation may improve by some tuning of parameters.

I am not convinced there are no protocol problems at larger densities, but more systematic studies are needed to determine this.  The example program can be used as a starting point.
Comment 13 Tom Henderson 2011-05-20 19:21:16 EDT
I recommend that for ns-3.11, we take the following steps:

1) commit the bug fix and logging patches above
2) add the example program to allow others to continue to study this
3) provide some documentation in a (new) AODV model library chapter, with some of these test results and caution about dense network results.  The doxygen in src/aodv/doc/aodv.h can be used as the basis for this chapter.

I would recommend that we try to design a study to systematically document the performance of our MANET routing protocols, for beyond ns-3.11.
Comment 14 Tom Henderson 2011-05-23 00:48:36 EDT
(In reply to comment #13)
> I recommend that for ns-3.11, we take the following steps:
> 
> 1) commit the bug fix and logging patches above
changesets e860437bafcc and a3a0c0cfbba3
> 2) add the example program to allow others to continue to study this
changeset 3895c1620ccb

> 3) provide some documentation in a (new) AODV model library chapter, with some
> of these test results and caution about dense network results.  The doxygen in
> src/aodv/doc/aodv.h can be used as the basis for this chapter.

leaving open for documentation updates; removing blocker status
Comment 15 John Abraham 2011-06-02 06:55:49 EDT
It was observed that for this particular use-case, the protocol maintained the route-entries IN_SEARCH for a whopping 15 seconds, despite the fact that the neighbor adjacency was removed much earlier after hello_allowed_loss interval or after RERRs. Further 15 s is also a significant amount time due to the fact that the nodes move at 20m/s ( 15 * 20 = 300 m which is out of the radio range), thus causing a very unstable network.

On reducing the DELETE_PERIOD to zero, we get AODV performance numbers similar or better than the rest.
Comment 16 John Abraham 2011-06-03 15:13:09 EDT
This is the second problem  (don’t know the root-cause yet).See attached Picture. .With L7 traffic ON, the adjacencies dip.Also for reference the adjacency in the absence of app traffic is shown.
Comment 17 John Abraham 2011-06-03 15:13:36 EDT
Created attachment 1157 [details]
adjacency vs time
Comment 18 John Abraham 2011-06-03 18:32:52 EDT
(In reply to comment #16)
> This is the second problem  (don’t know the root-cause yet).See attached
> Picture. .With L7 traffic ON, the adjacencies dip.Also for reference the
> adjacency in the absence of app traffic is shown.

The problem can be traced to hello-timers being repeatedly cancelled when an AODV packet was received or sent. Thus when L7 traffic was sent, the RREQs and RREPs resulted in the hello-timers barely getting triggered, which caused severe loss in adjacency.

See attached patch "bug1099.diff" and comparison between protocols achieved now (compare.jpg)
Comment 19 John Abraham 2011-06-03 18:33:33 EDT
Created attachment 1158 [details]
patch
Comment 20 John Abraham 2011-06-03 18:35:21 EDT
Created attachment 1159 [details]
comparison with latest patch
Comment 21 John Abraham 2011-06-15 23:59:13 EDT
Created attachment 1164 [details]
packet receive rate comparison btw olsr,aodv,dsdv

packet receive rate comparison btw olsr,aodv,dsdv
Comment 22 John Abraham 2011-06-16 00:02:46 EDT
Created attachment 1165 [details]
Patch

1. When RERR is received at Node B from Node A about some unreachable destination Node C, this does not always mean Node C is not a neighbor with Node B.

2. Hello timers are not scheduled at ~ 1 sec intervals once RREQ/RREP starts
Comment 23 John Abraham 2012-02-03 06:28:46 EST
See also bugs 1193,1194, 1189, 1188.
As 1193,1194 fixes several issues, this bug will be closed for now and re-opened if more users complain of performance issues.