Bugzilla – Bug 1041
BindToNetDevice does not exclude the layer 3 routing process
Last modified: 2016-01-13 11:20:46 EST
Created attachment 1022 [details]
Disable layer 3 routing with BindToNetDevice sockets
As defined in the documentation the BindToNetDevice should emulate the behavior of a socket created using setsockopt() with option SO_BINDTODEVICE.
However using this binding does not avoid the normal layer 3 routing procedure to be called.
In case a node has more than one devices with a IP addresses set on the same subnet the routing process bring to select an interface which can be different from the one binded to the socket.
From , I see:
"""SO_BINDTODEVICE forces packets on the socket to only egress the bound interface, regardless of what the IP routing table would normally choose. Similarly only packets which ingress the bound interface will be received on the socket, packets from other interfaces will not be delivered to the socket."""
So, basically SO_BINDTODEVICE has effect for both sending and receiving packets through a socket. However, at the time when I implemented the functionality I thought that it was supposed to have effect only on the receiving part, not the sending. So basically my SO_BINDTODEVICE implementation is incomplete.
I won't comment on the patch itself for now. I find it really hard to read patches not in "unified" format. Maybe Tom will have better luck reading it...
I recently had some troubles with BindToNetDevice as well (see bugs #1527 and #1528), and I stumbled in the very same ns-3 bug (or behaviour).
The pros of fixing it: it *should* work like this. Plus in the ns-3 Socket documentation it is stated it should work in that way.
The cons of fixing it: someone could have misused the [bad] behaviour.
In my opinion this is a simpel bug and should be fixed. If someone sed in the wrong way the sockets... well, they can fix it.
Anyway, I think the solution might be quite simpler than expected, but I have to take a deeper look at it. Unfortunately the proposed patch is not so clear (see Gustavo's comment). Moreover it was for ns-3.10, so I'd prefer to work on it from scratch.
Hopefully I'll post a patch in the next few days, along with the test program for this bug, #1527 and #1528.
Ok, I checked the proposed patch and it is not that obvious. Moreover it does change a lot of internals which I'm not so keen changing (unless it's proven we do need 'em).
On the other hand I found out that there *is* a check for the correct oif, at least in Ipv4StaticRouting.
Now, the issue is sligtly more complex than it seems. At first one might argue that a socket bound to a netdevice doesn't need to ask L3 routing. On the other hand this is true only for "raw" sockets, where you want to send a packet as-it-is through a specified output device.
For L4 sockets, you'll need the L3 routing aslo for NetDevice bound sockets, as the packet destination might be on the same subnet or not (hence, you need the L3 routing to find out the router address).
As a consequence, the bug topic is misleading. You do need the L3 routing anyway, however it must check for the bound netDevice (if any) in order to limit the scope of the routing lookup.
For my personal understanding, I'd devide the behaviour in 3 separate requirements:
1) L4 sockets: must pass the BoundNetDevice to L3 in order to limit the L3 routing decisions (and the L3 routing algorithms must enforce this).
2) L3 sockets (IpRaw), same as before.
3) L2 sockets: the problem doesn't apply, as they're not calling L3 routing anyway.
Hence, the bug should be re-classified as "some routing protocol implementation isn't enforcing the bound NetDevice properly".
At this point we need a test program to check the correct (or incorrect) behaviour of each routing algorithm.
Until we have a proper testcase I wouldn't change the code.
I met a problem with int TcpSocketBase::Bind (const Address &address) as well. The call to bind does not update m_boundnetdevice so if your host has 2 sockets bound to two different devices, packets will flow out of the same interface :(
I fixed it the MPTCP branch and will try to backport to vanilla ns3 with a testcase.
Any advice on where to put the testcase ? create a new one ?
(In reply to Matthieu Coudron from comment #4)
> I met a problem with int TcpSocketBase::Bind (const Address &address) as
> well. The call to bind does not update m_boundnetdevice so if your host has
> 2 sockets bound to two different devices, packets will flow out of the same
> interface :(
> I fixed it the MPTCP branch and will try to backport to vanilla ns3 with a
> Any advice on where to put the testcase ? create a new one ?
A new test case would be perfect.
If I just create 2 sockets and bind them on different interfaces, I believe the transfers will be successful but all packets will use the same path instead of 2 paths. Hence I need to check on which interface the packet was received but I don't think I can check easily that.
Maybe I could just check if packets transit on both links ?
I went to write a failing test case for this bug. I ended up with a passing test case; maybe this is not an issue anymore?
If we have two nodes with two parallel paths:
(txDev1) 10.0.0.1 <------------------------------> 10.0.0.2 (rxDev1)
(txDev2) 184.108.40.206 <------------------------------> 220.127.116.11 (rxDev2)
if we bind the TxSocket to txDev1 and RxSocket to rxDev1, and send to 10.0.0.2, the (UDP) packet is received, as expected.
If we bind the TxSocket to txDev1 and RxSocket to rxDev2, and send to 10.0.0.2, the packet is discarded in Ipv4Endpoint on node n1 for failing the incoming interface test.
If we bind the TxSocket to txDev2, the Send() will return a -1 so no packet is sent. This is because UdpSocketImpl does consider the m_boundnetdevice parameter and passes it as a parameter to the routing protocol.
So, is this RESOLVED FIXED? Or is there another failing test case? (note, I did not check TCP, or IPv6, does anyone want to test that?)
Created attachment 2199 [details]
passing UDP IPv4 test case
Thanks for the test.
Let me test this with TCP sockets. In ns3.23 there were obvious failures for TCP and I don't think they have been solved since then but I'll check tomorrow or this we.
Created attachment 2200 [details]
I quickly added a TCP test case, blocking the path packets shouldn't use with SimpleChannel::BlackList,i.e. packets should go through channel2 so I block channel1.
With TCP, the test fails: no packet is received which means packet went through channel1.
Also testing on m_receivedPacket->GetSize () is dangerous because if test fails, the test crash (as in:
NS_TEST_EXPECT_MSG_EQ (m_receivedPacket->GetSize (), 123, "Correctly bound NetDevices");
I got a patch for this but do you confirm the enclosed test approach is good ?
I made some progress on the multihomed TCP test (without consideration to BindNetDevice) and packets that should go through channel 2 go through channel 2 (which was not the case during the summer 2015 in my experience).
Will try complete the TCP test with BindToNetDevice calls.