Bug 667

Summary: ECMP operation in global routing
Product: ns-3 Reporter: wilson thong <wilsonwk>
Component: routingAssignee: Tom Henderson <tomh>
Status: REOPENED ---    
Severity: normal CC: antti.makela, mathieu.lacage, ns-bugs, tomh, tommaso.pecorella
Priority: P3    
Version: ns-3-dev   
Hardware: All   
OS: All   
URL: http://groups.google.com/group/ns-3-users/browse_thread/thread/ca11d3ab23dc67e2#
Attachments: 1) Patch to amend CandidateQueue class
2) Patch to amend the SPF trees calculation and global routing protocol
3) Optional. To include more NS_LOG_INFO for debugging
updated combined patch
patch for per-flow load splitting
test case to reproduce problem
topology for the test program

Description wilson thong 2009-08-23 07:28:38 EDT
Created attachment 567 [details]
1) Patch to amend CandidateQueue class

*** Introduction ***

As of changeset 258cf77942bc, the global routing implementation does not consider equal-cost-multi-paths (ECMP) when building routing tables on nodes. The patches included in this bug report is intended to enhance the global routing implementation in this respect.

** Usage node ***

Apply the following patches in the order listed below (from 1) to 2) to 3))
1) enable-ecmp-change-candidate-queue-to-consider-lsa-type-when-ordering.patch
2) enable-ecmp-in-global-routing.patch
3) enable-ecmp-add-ns-log-info.patch

Set the "RandomEcmpRouting" attribute to true and packets are routed uniformly at random on ECMP. To disable random routing among ECMP, set the attribute to false.

There exists several methods to balance traffic among several ECMP, but currently only pure-random is implemented.

*** Change node ***

Four major changes to global routing implementation are listed below

- Amended CandidateQueue class so that network SPFVertex is always chosen before router SPFVertex when they are both equal distance from the root. This is an outstanding implementation in quagga also, according to the Kunihiro's comment that is left in the quagga 0.98.6 source code.

- Amended GlobalRouteManagerImpl class so that it is able to store multiple parents SPFVertex and multiple next hops from the root. Regarding updating the list of next hops, the old list of next hops for a node is just flushed with the new list of next hops whenever shorter paths to the node is found. Discussion on this flushing maybe necessary in view of the comments left in the method quagga-0.98.6/ospf_spf.c/ospf_spf_consider_nexthop ().

- Amended GlobalRouteManagerImpl class so that it now considers the case when a candidate SPFVertex is found to have equal distance to the root with an existing SPFVertex in the SPF tree. The amendment also allows adding multiple route entires into a root's routing table in cases there are two or more root's next hops for reaching a node.

- Amended Ipv4GlobalRouting class so that it can select one of the multiple routes for a given node when there exist ECMP. Currently, only the following selection method is implemented --- a route is selected from multiple ECMP routes uniformly at random.

*** Reference ***
- RFC2328, Section 16.1
- quagga-0.98.6\ospfd\ospf_spf.c
Comment 1 wilson thong 2009-08-23 07:29:39 EDT
Created attachment 568 [details]
2) Patch to amend the SPF trees calculation and global routing protocol
Comment 2 wilson thong 2009-08-23 07:30:48 EDT
Created attachment 569 [details]
3) Optional. To include more NS_LOG_INFO for debugging
Comment 3 Antti Mäkelä 2009-08-23 14:25:50 EDT
Just in case - I hope you are aware of my patch in bug #614 - not sure if it affects your design and how much.
Comment 4 wilson thong 2009-08-23 15:37:28 EDT
Updated the "usage note" below. The fully qualified name for the attribute should be "ns3::Ipv4GlobalRouting::RandomEcmpRouting"

(In reply to comment #0)
> Created an attachment (id=567) [details]
> 1) Patch to amend CandidateQueue class
> 
> *** Introduction ***
> 
> As of changeset 258cf77942bc, the global routing implementation does not
> consider equal-cost-multi-paths (ECMP) when building routing tables on nodes.
> The patches included in this bug report is intended to enhance the global
> routing implementation in this respect.
> 
> ** Usage node ***
> 
> Apply the following patches in the order listed below (from 1) to 2) to 3))
> 1) enable-ecmp-change-candidate-queue-to-consider-lsa-type-when-ordering.patch
> 2) enable-ecmp-in-global-routing.patch
> 3) enable-ecmp-add-ns-log-info.patch
> 
> Set the "ns3::Ipv4GlobalRouting::RandomEcmpRouting" attribute to true and packets are routed uniformly
> at random on ECMP. To disable random routing among ECMP, set the attribute to
> false.
> 
> There exists several methods to balance traffic among several ECMP, but
> currently only pure-random is implemented.
> 
> *** Change node ***
> 
> Four major changes to global routing implementation are listed below
> 
> - Amended CandidateQueue class so that network SPFVertex is always chosen
> before router SPFVertex when they are both equal distance from the root. This
> is an outstanding implementation in quagga also, according to the Kunihiro's
> comment that is left in the quagga 0.98.6 source code.
> 
> - Amended GlobalRouteManagerImpl class so that it is able to store multiple
> parents SPFVertex and multiple next hops from the root. Regarding updating the
> list of next hops, the old list of next hops for a node is just flushed with
> the new list of next hops whenever shorter paths to the node is found.
> Discussion on this flushing maybe necessary in view of the comments left in the
> method quagga-0.98.6/ospf_spf.c/ospf_spf_consider_nexthop ().
> 
> - Amended GlobalRouteManagerImpl class so that it now considers the case when a
> candidate SPFVertex is found to have equal distance to the root with an
> existing SPFVertex in the SPF tree. The amendment also allows adding multiple
> route entires into a root's routing table in cases there are two or more root's
> next hops for reaching a node.
> 
> - Amended Ipv4GlobalRouting class so that it can select one of the multiple
> routes for a given node when there exist ECMP. Currently, only the following
> selection method is implemented --- a route is selected from multiple ECMP
> routes uniformly at random.
> 
> *** Reference ***
> - RFC2328, Section 16.1
> - quagga-0.98.6\ospfd\ospf_spf.c
> 

Comment 5 wilson thong 2009-08-23 15:41:56 EDT
Seem bug #614 doesn't affect to logic here, just some rework on coding. ^^

And I try my best to follow closely how quagga-0.98.6 does the ECMP. As long as we all follow quagga, I don't see the global routing design needs any big changes.

(In reply to comment #3)
> Just in case - I hope you are aware of my patch in bug #614 - not sure if it
> affects your design and how much.
> 

Comment 6 Tom Henderson 2009-12-15 02:30:17 EST
Created attachment 700 [details]
updated combined patch

update to latest ns-3-dev, merge, some coding style and typo fixes.  Should be ready to commit.
Comment 7 Tom Henderson 2009-12-15 02:33:54 EST
Wilson, sorry for the delay in reviewing.  I think this is OK to commit now but I had two comments/requests:

1) what do you think about adding an attribute (and new behavior) that will cause all packets for a given flow to traverse the same route, but each flow gets a random route (perhaps by hashing some flow identifiers and modulo the number of routes)?  I think that type of behavior is more likely/useful than a per-packet randomization

2) it would be nice to have a simple test topology showing this behavior in action in our test suite, because I fear that this type of global routing behavior could be prone to breakage if it is not covered in regression testing.

If you agree, I can merge the patch and during the maintenance phase, could you help me with resolving the above two items?
Comment 8 Tom Henderson 2009-12-16 00:59:45 EST
changeset: afb51c7f34c2
Comment 9 wilson thong 2009-12-16 01:19:58 EST
Tom, thanks for reviewing and using the patches~!!

I am glad to help doing the two requests you made ^^ Will update you when they are ok~


Thanks~!!
Wilson
Comment 10 Tom Henderson 2010-01-29 10:44:26 EST
Need to finish off the two issues left open after the main patch was committed.
Comment 11 Tom Henderson 2012-12-30 17:01:44 EST
Created attachment 1491 [details]
patch for per-flow load splitting

This patch adds a new attribute "FlowEcmpRouting" to enable load balancing that keeps flows on the same path.  It is mutually incompatible with the per-packet "RandomEcmpRouting" option.  

The implementation just performs modulo division of the integer formed by summing the integer values of the flow's 5-tuple fields, modulo the number of paths.  More sophisticasted algorithms (e.g. RFC 2292) exist; does anyone want to try to improve the balancing approach?
Comment 12 Tom Henderson 2012-12-30 17:04:22 EST
While doing this, I discovered that the UDP implementation could be improved (to pass a packet in with a formed UDP header to RouteOutput()) which would allow balancing on locally outputted packets.  I'll commit a fix for that separate from this patch.

Also, before committing, I would write a test along the lines of the proposed example.
Comment 13 Tommaso Pecorella 2015-11-28 11:39:00 EST
Beware: the proposed modification ONLY works with TCP or UDP. If FlowECMP is set to true it will abort as soon as a packet different from TCP or UDP is sent.


(In reply to Tom Henderson from comment #11)
> Created attachment 1491 [details]
> patch for per-flow load splitting
> 
> This patch adds a new attribute "FlowEcmpRouting" to enable load balancing
> that keeps flows on the same path.  It is mutually incompatible with the
> per-packet "RandomEcmpRouting" option.  
> 
> The implementation just performs modulo division of the integer formed by
> summing the integer values of the flow's 5-tuple fields, modulo the number
> of paths.  More sophisticasted algorithms (e.g. RFC 2292) exist; does anyone
> want to try to improve the balancing approach?
Comment 14 Tom Henderson 2017-08-05 14:58:35 EDT
Created attachment 2892 [details]
test case to reproduce problem
Comment 15 Tom Henderson 2017-08-05 14:58:56 EDT
Created attachment 2893 [details]
topology for the test program
Comment 16 Tom Henderson 2017-08-05 15:01:16 EDT
This issue has been open a long time and some previous comments (e.g., #7, #11) may still need to be addressed.

However, Edgar Costa Molero (edgarcosta@hotmail.es) recently reports a problem as follows, and provides a test case:

 - Nodes do not learn all the possible multiple paths when using CSMA interfaces, but they do when using point-to-point (for my work I used p2p so I am fine but I just want you to know that the problem still exists).
-  In the example, every node has type “Node” and I only send traffic from node 0 to node 12 (see red nodes in the image).
-  If you do that and enable random per packet ECMP (your only ecmp algorithm), Node 1 only uses its output number 0, however node 6 uses all the 4 possible outputs. You can see that happening looking at different places:
- 1) If you check the animation you will clearly see that there is a problem with csma. (not sure if you will be able to use the animation with dev.
- 2) You can have a look at the routing table of node 1, however that can be a bit confusing.
- 3) You can also add a “print/log” line in ipv4-global-routing.cc in the line where it checks if ecmp is enabled  -> “if (m_randomEcmpRouting), here you could add: NS_LOG_UNCOND(Names::FindName(m_ipv4->GetObject<Node>()) << " " << selectIndex);

- In the ecmp-problem.cc test case code you can also see in the line 282 that the csma helper is declared, you can comment  those lines and uncomment the lines below (from 290) to use a p2p interface instead (note that it is also named csma, so you dont have to change anything else in the file when uncommenting).
- Also, when you use CSMA and  there are multiple paths, this assertion will be called: (so for my tests i commented it out)
 assert failed. cond="m_ecmpRootExits.size () <= 1", msg="Assumed there is at most one exit from the root to this vertex", file=../src/internet/model/global-route-manager-impl.cc, line=316
 terminate called without an active exception