Difference between revisions of "GSOC2015MpTcpImplementation"
|Line 433:||Line 433:|
== Final review ==
== Final review ==
quite there yet.
Revision as of 14:07, 25 August 2015
Return to GSoC 2015 Accepted Projects page.
- 1 Project overview
- 2 On the MPTCP subject
- 3 On the per node clock
- 4 Expected deliverables
- 4.1 Validation
- 4.2 Week 1 - Step 1
- 4.3 Week 2 - Deliverable for Step 1; start of Step 2
- 4.4 Week 3 - Step 2
- 4.5 Week 4 - Step 2
- 4.6 Week 5 - Deliverable for Step2 and Step 3
- 4.7 Week 6 - Step 3
- 4.8 Week 7 - Deliverable for Step 3; start of Step 4
- 4.9 Week 8 - Step 4
- 4.10 Week 9 - Step 4
- 4.11 Week 10 - Step 4 and Deliverables for Step 3 and 4
- 5 Weekly progress
- Project: Implementation of MPTCP (Multipath TCP) + Implementation of per-node clocks
- Student: Matthieu Coudron
- Mentors: Tom Henderson, Vedran Miletic, Tommaso Pecarolla, Peter Barnes
- Code: https://github.com/teto/ns-3-dev-git/tree/master (check out the different branches)
- About me: I am a PhD student working on multipath communications. I have a background in network and system security.
On the MPTCP subject
Thanks to last year TCP option gsoc, it is possible to implement multipath TCP - an extension to TCP that is more and more popular (used in Apple voice recognition system "SIRI", embedded in yosemite, some Citrix products, soon to be embedded in Proximus products) - in a clean way. MPTCP is available in some (possibly out of branch) kernels - Linux, Mac OS, FreeBSD - and work even with adversary middleboxes (Contrary to SCTP), which was an important challenge. The 2nd challenge is still pending, ie, how to make the best usage of the path diversity ? How to be better than TCP without being more aggressive than TCP at bottlenecks ? I hope that being able to run MPTCP in a simulator could foster research on that peculiar subject since doing it with kernel code or creating a multihomed (3G/wired) setup can be complex (MPTCP kernel code is being refactored, and implementation is quite time consuming). There is no solution in the literature that answers this in a robust way.
Here is a diff file of the beginning of one mptcp incomplete implementation based on ns3.19: http://downloads.tuxfamily.org/bluecosmos/ns3/final.patch. It was generated through this kind of command (I just discovered the filterdiff utility, pretty cool): diff -ENwbur ~/ns3off/src/internet src/internet > test.patch cat test.patch | filterdiff -p0 -X toexclude.txt > final.patch
To help reviewers focus on the architecture, I removed some unnecessary files (but this is still a huge diff) and I add some comments in the following about MPTCP and the code. To sum up, the main files to check are mptcp-socket-base.* and mptcp-subflow.* and the modifications made to tcp-socket-base.* .
1/ First of all, MPTCP doesn't require modifications to be modified, so does this implementation, it just appears as another TCP variant so the MPTCP socket works with all the code that can work with a TcpSocket. 2/ MPTCP is a TCP extension, all the signaling is done through TCP options 3/ The application sees a *fake* TCP socket usually called the "meta socket". This socket then dispatches the data to send among the different TCP connections of the MPTCP connection (usually called subflows)
TcpSocketBase |-MpTcpSocketBase (this is the "meta socket", a logical socket that dispatches the send buffer between the different MpTcpSubflows, and reorder the segments received on the different subflows for the application to see) | |-MpTcpSubflow (This is a copy/paste of TcpNewReno except that it handles MPTCP logic, add/pop options when necessary)
4/ the standard demands that MPTCP should not be more greedy than TCP so there are congestion control algorithms specific to MPTCP. In the diff you just mptcp-cc-uncoupled. the way it's implemented, you subclass both MpTcpSocketBase and MpTcpSubflow into MpTcpSocketBaseUncoupled and MpTcpSubflowUncoupled.
5/ the path management files/classes are not used in the implementation and they don't have the same meaning as in the linux kernel, these classes are meant to attribute the unique IDs for each possible subflow (as required by the standard). In the linux kernel, path management modules implement policies as to yes or no should the meta establishes a new subflow.
6/ MPTCP has a global sequence number (to reassemble packets in order at the receiver) that is conveyed through a TCP option. Every TCP sequence number should be mapped to an "MPTCP sequence number". There are strict rules concerning these mappings: once a mapping is sent, it can't be changed, the data has to be sent, resent even if it was received on another subflow etc... the mapping is responsible for much complexity of the code. It means a mapping can't be removed as long as the whole data has not been received, and the data can not be passed to the upper layer because there may be a checksum covering the whole mapping.
Features required from ns3: - Need to decorralate Tcp sender unacknowledged head (SND.UNA) from TcpTxBuffer - MPTCP demultiplexing is not done on the 5 tuple but on the key embedded in the MPTCP capable option - it should be possible to set a memory size for the Meta buffer and to share this space with subflows, ie it should be possible for TCP buffers
One critical aspect of multipath protocols is the reordering problem that usually require larger buffer to get the same performance as single path protocols. The main challenge to simulate correctly MPTCP is to mimic linux buffer mechanisms (in my opinion).
Nb: MPTCP has many mechanisms to deal with middleboxes and such but I don't believe they are interesting to have in ns3 which should be used to analyze the algorithmic part, thus none of the failure mechanims are implemented (e.g. fallback to TCP in case the server is not MPTCP compliant etc...).
On the per node clock
I would like to start implementing per node clock to be able to simulate time distribution protocols. Right now nodes are all perfectly synchronized ins ns3 (they share the simulator clocks). My goal is to be able to run NTPD in ns3-dce over ns3 nodes with drifting clocks. Time distribution experimentations are hard to do in practice (do you control 2 or more stratum 1 NTP servers ? and the traffic between these), I believe it makes sens and I know of no simulator that does it. This proposition is a follow up of my email to the dev ml: http://network-simulator-ns-2.7690.n7.nabble.com/Addition-of-per-node-clocks-in-ns3-td29301.html
While working on the previous projects I also intend to send patches to improve some parts of the ns3 code (such as the waf upgrade I sent last week). I plan to work during the first half on the MPTCP code and then on the per-node clock integration. The MPTCP code has the priority though since this is the most awaited feature I believe.
I intend to validate MPTCP against DCE. This may require some synergy with the TCP validation project.
Week 1 - Step 1
- Modify tcp-option.h to support MPTCP
- (de)Serialization of the numerous MPTCP options
Week 2 - Deliverable for Step 1; start of Step 2
- Add MPTCP crypto
The following was the initial plan but it may be postponed:
- Adapt TcpSocketBase to be more flexible (making all function virtual, overload some functions with TcpHeaders parameters instead of flags etc...)
- Same for TcpXxBuffer
Week 3 - Step 2
- Addition of test scripts, to trace buffers
Week 4 - Step 2
- put DCE infrastructure into place
Week 5 - Deliverable for Step2 and Step 3
- Implement linux MPTCP schedulers to be able to compare
- Implement OLIA/LIA congestion controls
Week 6 - Step 3
- MPTCP may still need some polishing at this point
Week 7 - Deliverable for Step 3; start of Step 4
- Addition of a Clock m_clock member in each Node.
- (Peter) Consider adding the clock by aggregation instead. I haven't thought this through, but I think aggregation will make it easier to manipulate the clock through the Config system, for example.
- Addition of a perfect clock (default behavior won't change)
- Addition of a drifting clock with initial offset
Week 8 - Step 4
- making ntpd work in DCE
Week 9 - Step 4
- making ntpd work in DCE (indeed: that looks complex)
Week 10 - Step 4 and Deliverables for Step 3 and 4
- test the whole thing
- Add some tests/documentation
Week 1 - Step 1
Sticking the plan In summary, this week has delivered the following:
- (de)Serialization of the 7/8 mptcp options with their documentation - Associated testsuite
The implementation of these message can found on the repository.For more details check wiki. During next week, while waiting for a clearer schedule over the mptcp work I plan to:
- Add the pending mptcp crypto testsuite (depends on the discussion) - Continue the work I've started in background over netlink export
from DCE. This is something I've started long ago but it proves quite difficult, since wireshark can't dissect raw netlink, it expects it to be contained within a "cooked linux" header that libpcap generates but not DCE (yet). Current DCE netlink implementation does not work with NTPd, that's why I look into it. (Netlink is the linux communication protocol between kernel and userspace).
- Send some patches to DCE to support ntpd
Week 2 - Step 2
This week has delivered the following:
- Generation of MPTCP token/initial sequence data number based on libgcrypt when available or an ns3 implementation not compliant with the standard). waf configure seems to detect correctly libgcrypt but it does not seem to pass on the -DENABLE_CRYPTO flag. This is something I have to look into - Associated testsuite - Generated a diff of the MPTCP implementation against ns3.19 to help reviewers get a feeling of the architecture. - I also got more understanding of a DCE bug  .
The implementation of these message can found on the repository. For more details check wiki. This week end got surprisingly hectic for me so I did not have the time to clean my DCE tree, for next week:
- I have good hope to find & fix my DCE bug. Once this is done, I will have a respectable amount of code to clean and push. - Fix the ENABLE_CRYPTO flag detection - Review and maybe merge nat's code - Try to generate mptcp graphs with DCE + linux kernel
Week 3 - Step 3
This week has delivered the following:
- fixed the netlink bug described in  (to help me in that task, I
extended wireshark's netlink dissector)
- About the libgcrypt problem, it doesn't provide pkg-config so I
found a waf alternative that can check for its presence without it. Test nearly finished.
- Rather than sending patches dependant on one another, I've pushed
the clock support code for dce  and ns3 . This seems to launch ntpd but I've no idea if it works
I've not had the time to look at mptcp dce as I had (ambitiously xD) proposed
For the upcoming week:
- prepare the mptcp crypto code for the 20 of june review - understand if clock synchronization happens or not and fix what
needs to be fixed.
 https://code.wireshark.org/review/#/c/8916/1/epan/dissectors/packet-netlink.c  https://github.com/direct-code-execution/ns-3-dce/issues/2  https://github.com/teto/ns-3-dce/commits/clock_support  https://github.com/teto/ns-3-dev-git/commits/clock_support
Week 4 - Step 4
This week has delivered the following:
- I have modified my testing script so that it can run with different
ntp programs : ntimed-client, ntpd, chronyd. Ntpd has a huge codebase so I prefer to work for now with simpler programs ie chronyd & ntimed.
- Openntpd portable now can compile for DCE  but I have not tested it yet - I wrote a script that evaluates the number of unimplemented DCE
syscalls compared to what a binary calls. It got merged in DCE . It can be interesting to evaluate the complexity of running a program within DCE. It is worth mentioning so that all the missing syscalls do not need to be implemented to run the program within DCE. For instance the ntpd binary calls 80 syscalls missing from DCE but the part of the program I use doesn't call these 80 syscalls so ntpd works fine.
- got some fixes on the mptcp crypto part 
I have not been able yet to draw plots about offsets of the clocks. This is what I plan for the next week. So far, programs can't steer the clock (adjtimex/settimeofday not implemented) and this is something I also hope to accomplish for next week but I have little idea about how complex it can be.
 https://code.wireshark.org/review/#/c/8916/1/epan/dissectors/packet-netlink.c  https://github.com/openntpd-portable/openntpd-portable/issues/13  https://github.com/direct-code-execution/ns-3-dce/pull/4  https://github.com/teto/ns-3-dev-git/commits/libcrypto
The initial schedule of my gsoc got reversed due to the presence of the other TCP Gsoc. I preferred to wait for these changes to come first before rebasing my previous work on MPTCP. I am also eager to read comments about the architecture of the MPTCP implementation (link at the top of this page https://www.nsnam.org/wiki/GSOC2015MpTcpImplementation#On_the_MPTCP_subject).
Here is a list of repositories with the work done so far: - https://github.com/teto/ns-3-dev-git/commits/libcrypto : this branch contains the code to a cleaned version of the mptcp options (de)serialization code. I've added some files about cryptographic processes used in MPTCP. I did not test this part extensively enough. - https://github.com/teto/ns-3-dev-git/commits/clock_support WIP: This branch implements an aggregatable virtual clock that can be set on a per node basis. This can be used to modelize how node synchronization can play a role in different protocols. It can also be used with the following DCE version. - https://github.com/teto/ns-3-dce/commits/clock_support . This DCE fork can't work without the specific ns3 branch previously mentioned. This branch fixes an important bug about the netlink implementation of DCE as well as adds documentation.
This week hasn't delivered much in sound code as I was trying to understand how to simulate the clock. I've discovered the simulator  and proposed a similar way to simulate time. It was rightfully objected by my mentors that it would change ns3 into a timestep simulator. I had to come up with an alternate design which still needs some feedback. My proposal is to add Node::Schedule members that would convert node's time to simulation time. Whenever the node's clock frequency is changed or stepped, it should cancel the registered events and reregister them with an updated simulator time.
Now the translation between node and simulator time is not that easy even in the simple case to which I limit myself (I only plan to support adjtime and not adjtimex). The clock's frequency has 3 potentially components: the background frequency + the ntp correction + a singleshot ntp correction that is removed as soon as the singleshot offset is compensated for.
So for instance when translating from node's time to sim time, you potentially have to distinguish between 2 phases, the one with the singleshot frequency and the one without.
I've started some experimental implementation adding C++11 flags to quickly relay Node::Schedule calls to Simulator::Schedule (replaced with legacy templates if the branch proves successful). This doesn't compile yet. My plan is to have a test for the clock conversion covering different frequencies.
I've updated the wiki with the midterm report with mainly a patch for mptcp options. I've added to it some crypto code so I would not advise it for immediate inclusion (as I had previosuly done) since I am not sure about how stable the crypto API will be.
This week thanks to the help of Peter, Steve & David, I've started the API to allow scheduling events based on a node's clock. Instead of using Simulator::Schedule, one would use Node::Schedule that would relay these calls to Simulator::Schedule with a corrected timestep depending on the node's clock. Compared to last week's design, we only schedule the first node event in the node queue and as soon as that event is finished, the next event of the node is pushed to the simulator's queue. This way, if the frequency of the clock changes, you only have one event (already pushed to the Simulator) that you have to cancel & repush with a new timestep.
The idea was to write tests along the implementation. So I've started writing 2 tests: 1/ One test is for the clock ("clock-test.cc"). It sets a frequency and checks that the clock time matches the tested skew against simulator time. It is possible to change the frequency during the test to check that it calls the test callback. It should also be possible to test against timesteps but this is not implemented yet. 2/ The 2nd test is in "node-schedule-test" and is equivalent to simulator-test-suite but with calls like m_node->Schedule( Time(3), ...). It also checks that events trigger according to node's clock. There are no Node::*Destroy functions as they don't seem fundamental yet.
The tests don't pass yet so for next week, the plan would be: - to pass the simple cases - as Nat finished the bulk changes on the TCP stack, to start working on a rebase for MPTCP.
As proposed in week 6: - I fixed the clock tests so that they now pass . - I submitted a patch to change a few things in pcap handling to unblock some  - started rebasing mptcp implementation on nat's branch 
My planning for next week is: - to upstream most of the DCE changes I wrote during the first Gsoc part (netlink fix, some syscalls, doc...) - keep working on mptcp rebasing
 https://github.com/teto/ns-3-dev-git/tree/clock_support  https://github.com/teto/ns-3-dev-git/commits/netlink_ok  https://github.com/teto/ns-3-dev-git/commit/f6cd305de0378b028d1ca9275a8f5d8b7835b437
I've not been able to complete the tasks partly because I had less time to work due to personal issues and the tasks I had in mind required someinteractions. I spent time to investigate how to improve the MPTCP codebase. While the initial codebase copies a lot of code to minimize changes it makes it hard to maintain. Also the TCP changes made by Nat though interesting break how the initial MPTCP aware socket was created. Somehow this is a chance since it motivated me to look into MPTCP fallback. So far the user had to create an MPTCP socket and it would fail it the peer was not MPTCP compliant. I hope to minimize the changes needed by the user to ease the use of MPTCP. This was not in the initial proposal and I've had a hard time defining an API but I believe we should do it as in the (linux) kernel: -the client sends the MPTCP options during the SYN but does not instantiate any MPTCP related structure -upon SYN/ACK, it generates MPTCP structures, copies the current socket (the copy becomes the "master" socket) and uses the socket seen by the application as the metasocket
In the case of ns3, this approach is more problematic because of the Ptr<Socket> indirection. Thus I've used "placement new" to replace a TcpSocketBase instance with an MpTcpSocketBase. This is a bit tricky and this as far as I went.
So for next week to sum up, I have to: -address comments on time simulation -keep working on MPTCP
This week has been really tough in the sense I had to deal with 2 problems in domains I was not very confident with. First I had to change the link order so that libgcrypt got linked first, which is not that simple if you are not familiar with waf. Many thanks to Alex Afanasyev who really saved me there. libgcrypt license is compatible with ns3. As discussed on the mailing list, it is an optional dependancy and it felt more natural than duplicating sha1 in ns3. This should allow to read mptcp pcap in wireshark, which relies on the token to identify subflows of a same connection. The second problem I had was dealing with the placement new. It seemed to create bugs in libc ("double linked list corrupted") and as I was unsure it was valid C++ I looked at the libc code just to realize that in fact I had left in the code a "CopyObject<TcpSocketBase>" that was not allocating enough memory to convert this TcpSocketBase to a MpTcpSocketBase.
As for the TCP design, what I've done is to remove the TcpSocketBase::ReadOptions that was reading options even when the packet was discarded later on. I contextualize option reading as in linux with functions such as "ProcessTcpOptionsSynSent". I've also introduced a function "TcpSocketBase::IsTcpOptionEnabled(TcpOption::Kind)" which is easier to remember than the boolean for each option.
As for next week, development should be a lot smoother: - Vedranm pushed the patch that was preventing me to propose DCE PRs. - I should be able to have a single subflow scenario running
This past week, I had planned to upstream some DCE code and I had to hook HookDefaultSink with the (header,packet) version. Instead of adding a new function in the PcapHelper, I thought ns3 could bind the correct callback depending on the signature, something like if you can't bind to X, try Y etc...
By the way, I don't know why this function is a template since it is equivalent to "void PcapHelper::HookDefaultSink (Ptr<Object> object, std::string tracename, Ptr<PcapFileWrapper> file)".
So I dived into TraceConnectWithoutContext details and the callback mechanism. I looked into Callback::DoCheckType which is supposed to compare callbacks but apparently it accepts any callback (if none is set, it accepts and otherwise there is a dynamic_cast which I suspect to always succeed even though callbacks are not compatible). All of the compile time information disappear at runtime. So my idea was to retain the difference in types within CallbackBase since due to object slicing (during the upcast, the most derived members disappear) usin virtual functions is "prohibited". My idea is to set a member in CallbackBase that register std::type_info of callback. I've made a first approach in a specifc branch that allows to compare callbacks a lot better than the current ns3 . The drawback of the current approach is that it is not possible to run typeid(*this) in Callback constructor since the object is being constructed. Thus I run it afterwards but the problem is that if MakeNullCallback was not called, then the type checks don't work If we had C++11, we could use an std::tuple and it would always work (I think). I understand the previous explanations may not make sense at all but to sum up, the added callback compareasons in  would not work without my changes. I don't think I will have the time to work more on this so I put it on hold (since MPTCP takes precedence).
I've sent a patch  to flush pcap writing so that packets get recorded even in case of a crash.
As for the MPTCP stack, I've kept on improving the design. IMO the design is much better than the original one but doing so I broke many features and I am slowly putting things in order again. There used to be a mix between the meta socket and subflows while the separation is now clearer with the meta socket really becoming a shim layer, without any actual packet. As such single path communications start sending data but the closing phase it not working yet .
For the following week, I plan to fix the MPTCP closing phase so that 1 path communications work flawlessly and then move forward to the multiple subflow case.
 https://github.com/teto/ns-3-dev-git/blob/add_sink_with_header  https://github.com/teto/ns-3-dev-git/blob/add_sink_with_header/src/core/test/callback-test-suite.cc#L744  https://github.com/nsnam/ns-3-dev-git/pull/3  https://github.com/teto/ns-3-dev-git/tree/merge_nat
This is my final report for the Gsoc 2015 on Clock support & MpTcp Implementation on Linux.
I will sum up the work done and some propositions for the after Gsoc. - During the first half of my Gsoc, I have been working on allowing each node to run its own clock . This way each node can have its own time. It makes sense to test the difference in behavior of protocols that depend on node synchronization. The clock takes some parameters such as the raw frequency which can change over time. In the end every event must be queued into the main simulator class. To prevent the addition/update (when clock parameters change)/removal of events, we only queue the next node event in the global Simulator. Thus if the node clock change, it's enough to reschedule the event. Once the event is fired, we schedule the next node event. During this process, I also was able to run an unmodified ntpd (the Nework Time Protocol daemon) in DCE with specific DCE patches .
- As for the Mptcp implementation, I got a much cleaner version then the initial one. There is now a clear difference between the "meta socket" (seen by the applications) and the TCP subflows (traditional TCP connections). The old meta was sending packets etc... while now it focuses on the logic between subflows. Also it used to be a modified copy/paste of TcpSocketBase while now subflows and meta integrate with it, resulting in less code duplication. I've also improved the TCP/MPTCP compatiblity, i.e., MPTCP is now a TCP option like any other while you used to create a specific MPTCP socket. For instance when you had one MPTCP socket connecting to a legacy TCP one, it was impossible before the Gsoc and now it is.  I managed to exchange a file between 2 nodes with 1 subflow. The multiple subflow case is trickier and while a second subflow can be established, I have to do more testing before assessing anything else.
I've also sent a few patches to: - upgrade waf to 1.8 (with vedran's help mostly) - provided a script to allow YouCompleteMe (a vim completion engine) to work with ns3
I don't think any of these 2 are ready for reviewing yet. Don't get me wrong, I have working tests in each case but each project is intertwined with other ns3 components (the TCP stack, DCE, event handling) and thus require specific attention to the API. I don't think I am quite there yet. Once I am I will clean/document the code and send official reviews. You can already have a look at it but the documentation may not be quite there yet.