Back to GSoC 2021 projects
- Project Name: Direct Code Execution Modernization
- Student: Parth Pratim Chatterjee
- Mentors: Tom Henderson, Apoorva Bhargava, Vivek Jain
- Project Goals: DCE currently makes use of net-next-nuse to extend the Linux kernel internals like the networking stack to host applications but over the years the project hasn't been updated with the latest releases of the Linux kernel. As Linux progressed with newer releases, a major part of the source code changed, making previous glue code incompatible with the newer implementations of the network stack as some of the init calls and function usage changed significantly making migration to newer releases non-trivial. This project aims at enabling support for latest Linux kernel features and toolchains in the DCE environment with support for the socket networking stack, sysctl interfaces, system call access, etc. without any changes to the user APIs currently being used by host applications. The project aims at incorporating the LKL(Linux Kernel Library) into the DCE environment for host applications to effortlessly make use of Linux kernel stacks with minimum to no change in existing simulation scripts.
- About Me: I'm a freshman Computer Science undergraduate student at Kalinga Institute of Industrial Technology, Bhubaneshwar, India. I have a keen interest in Linux internals and computer networking. I was a grand prize winner at Google Code-In, 2018 for ns-3 organization, which helped me initially get introduced to DCE. I have an aptitude for Competitive Programming and heavily make use of C/C++, STL and other OOP concepts in solving algorithmic puzzles. I have an experience with C/C++ and Python of more than 3 years, working on projects for numerous Hackathons.
Milestones and Deliverables
- Phase 1
- Add links to kernel exported functions in KernelHandle, used in DCE.
- Fill up DceHandle struct with function references to DCE exported functions.
- Implement all the host functions, link it to custom written mutex, semaphore and pthread functions, create the lkl_host_ops structure, initialize kernel with lkl_start_kernel.
- Implement the sim_init method.
- Phase 2
- Add support for socket networking and netdevices interface functions which will be exported to DCE.
- Phase 3
- Add support for the new struct naming conventions.
- Add support for liblkl by default in bake, ns-3-dce wscript, custom gcc parameters like -fpermissive(without which the header files won’t compile).
- Align the sysctl usage in LinuxSocketFdFactory according to LKL.
- Modify existing examples to load liblkl.so. Fix errors and bugs that might appear in this stage.
- I thought of testing the scheduler fix for LKL which I thought of previously.
- Idea : The accept network system call depends on the following working principle. For AF_INET sockets it executes the inet_accept() function call which further sets up a few locks and other stuff and ultimately controls goes to inet_wait_for_connect(). This function is responsible for putting the current thread to TASK_INTERRUPTIBLE state and issues a schedule_timeout() which sets up a mod_timer to call back the called thread after particular amount of jiffies have passed, meanwhile it also schedules other tasks in the run queue so that while the thread which issued accept call is waiting for the call to return back with a file descriptor of the newly opened socket, we can run other tasks. This is pretty straight forward on any operating system, assume having two application, one server and one client, so while the server waits on accept, the linux scheduler schedules the client to issue a connect() call and then switches back to the server where the accept() returns back with a file descriptor on which send and recv operations could be done(in the same way as above). But, this isn't that straight forward with DCE.
- First and foremost, DCE doesn't work on actual threads, rather it uses fibres(LWP) and creates contexts and switches back and forth between them using TaskManager and Fibre Switch handlers. A few important functions are TaskManager::(TaskStart/TaskWakeup/TaskWait/Schedule/TaskSwitch...). Now, since we are loading LKL as a shared library the kernel scheduler inside the LKL space doesn't have access to the native threads we work with in DCE and thus when any system call calls the scheduler, LKL keeps scheduling the tasks(ksoftirqd,tasklets,workqueues,kernel_threads,IRQ handlers) created inside the LKL environment and never reaches the DCE threads keeping the DCE execution at a standstill on blocking operations such as accept/connect/send/recv. Apart from normal syscalls, socket functions like send register a namespace skb destructor which gets pushed to a work_queue for later execution when sk_free is called. workqueue ultimately depends on schedule to schedule the rescuer kernel threads created to dequeue all the tasks pushed to the workqueue.
- Solution 1 : I though if the problem is, not reaching the DCE threads/fibres, then let's overwrite the LKL scheduler to first schedule the DCE threads first and then schedule the LKL tasks because LKL being a uniprocessor system(there goes another problem....i'll talk about that later in the report too) won't make much a difference(ummm...at least that's what seemed to me). So, I overwrote the schedule_timeout and schedule function to put the current DCE thread to sleep for given timeout and start the DCE scheduler with TaskManager::Schedule (private function, so had to create some public functions to run them), and then ran the scheduler inside LKL.
- Report 1 (did not pass) : I didn't notice the fact that some of the init calls under start_kernel defined in init/main.c required the scheduler. The problem with this is, we need to first call lkl_start_kernel(...) which calls the init functions and since we have already prioritized the DCE threads over the kernel threads, the kernel didn't initialize and simply jumped to the other DCE thread which started with creating the socket even when the network interface wasn't even initialized, and the funny part is the program didn't crash rather it got stuck again.
- Solution 2 : I thought it was a problem with the scheduler and decided to ignore it and try to make the patch more specific by making scheduler calls within inet_wait_for_connect and avoiding changes in the universal LKL scheduler.
- Report 2 : Now, the init calls worked as they used to before and the system came to a point where after the accept call the system switched over to the other DCE task. But, the socket(...) function got stuck the same way as it did in Report 1.
- Why did this happen?
- Turns out it's not our fault, it's more like how LKL was designed to work. Remember, I spoke about LKL being a uniprocessor system. So when we initialize LKL with lkl_start_kernel it basically does a bunch of things including running init calls such as setting the thread and cpu mutex locks and semaphores. One of the most important things is lkl_cpu_change_owner(lkl_pthread_t) which operates on the cpu variable of type lkl_cpu. Now the lkl_cpu has a lot of fields, some of the important once :
- lock : a mutex lock to decide which host thread currently holds access
- owner : thread id of the cpu locker owner
- count : no of times the current thread acquired the lock
Now, the lkl_cpu_change_owner function checks if the count is not > 1 i.e the lock has not been provided to any other thread(including itself) for more than once and then changes the owner to itself keeping the count intact.
- So, when our socket(...) system call is run using the lkl syscall APIs, it enters the lkl_syscall(...) function which first requires it to get the cpu lock using lkl_cpu_get(...), now since for example Thread 1 ran lkl_start_kernel and the first syscall(namely accept) and acquired a lock already, and while it was processing, the scheduler decided to switch to another Thread 2, which issued another syscall, which in our case in socket(...) it will be waiting on a mutex for the first function to return so that it calls lkl_cpu_put() and the lock is freed for Thread 2 to acquire.
- Now, there's something else to keep a check to. Even if we ran lkl_cpu_change_owner for every syscall, it would be inconsistent and would obviously lead to failures but even if it doesn't fail, LKL would guarantee it would fail. LKL works on the idea that only one host thread acquires the lock and thus when we manually make the previous thread drop the lock using lkl_cpu_put, make the current thread as host using set_thread_flag(TIF_HOST_THREAD) and then change the cpu owner, the subsequent syscalls would fail the cpu.count keeps increasing which has an upper bound of 1.
- There's one legal way of changing cpu owners and that is through the default LKL scheduler function schedule(...) which makes context_switch(...) and calls the __switch_to(prev,next) function defined in arch/lkl/threads.c which legally changes the cpu owner from the prev to next, where prev and next are of type struct task_struct.
- I then decided to work simultaneously on the writing an SMP interface of LKL based on x86 and arm64 architectures and porting net-next-nuse to Linux-5.12.
- Opened a thread on LKL developers lists to get expert suggestions on design and implementation of the project integration in DCE : LKL Integration into DCE
- net-next-nuse port : Linux-4.5 Successful Build and Linux-5.12 defconfig generation report :
- I could get Linux-4.5 to work(tested it with DCE too) with some work. To be specific I had to do the following things :
- The net-next-nuse Makefile under arch/lib uses objcopy and nm to rename some symbols exported by the kernel using the EXPORT_SYMBOL(...) call and maps it to rumpns_<symbol-name>. These symbols are usually rewritten by net-next-nuse as only a few to_keep files are compiled to lower the size of the shared library and also to get control over certain important parts of the kernel such as the paging service, scheduler, and proc_sysctl interface. Some functions in Linux-4.5 had been written in a different way as compared to Linux-4.4, for example rather than exporting put_page(responsible for handling compound or single pages) has been replaced with __put_page, so had to rewrite the implementation a bit.
- It also introduces slib under memory management which is a memory page slab allocator making use of some dce/nuse routines too, much inspiration has been drawn from the slob allocator(not linked in the library).
- Had to modify some of the include/linux/slab.h to include the CONFIG_SLIB macro so that only the functions rewritten in mm/slib.c (custom written) could be included in the final build, as it has already enabled in Kconfig.
- Setup the proc_sysctl interface based on the following commit
- I think moving over to Linux-4.6 wouldn't be much tough too, though I haven't tried it yet.
- I was a little too curious to get my hands on the latest Linux kernel, so I tested Linux-5.12 too, and with the following changes mentioned below could get the first step(of the two steps) needed to generate the net-next-nuse library to complete successfully (i.e. make defconfig ARCH=lib)
- Identified changes in defconfig process, so had to change the order of defconfig storage which should specifically be under arch/$(ARCH)/configs and also how defconfig were being called on the Kconfig linux build configuration file.
- Kconfig scripting language has evolved quite a bit since net-next-nuse-4.4 release so had to modify the way Kconfig was using environment variables according to this commit by linux
- A significant amount of work is yet to be done in order to make (make library ARCH=lib) work. I'm working on it too.
- I could get Linux-4.5 to work(tested it with DCE too) with some work. To be specific I had to do the following things :
- I also worked on creating a Docker setup of ns-3-dce which would add the following advantages
- Lower down the current 12+ GB disk usage to 7GB (just 720 MB more than the previous release). This additional 1 GB is due to a patch I made that enables ns-3-dce to build using a custom Glibc-2.31 setup, so that features like vtable hijacking used by dce to redirect functions like fopen and fseek to custom dce defined implementations)
- In the dce-docker-beta folder(my git repo) there is a bake directory and any changes made there gets synced with the docker internal bake directory, so users can both run simulation scripts, as well as make development changes to all the projects in the current bake installation.
- My beta docker ns-3-dce test image can be tested using the following commands(on any machine which can run docker and docker-compose) :
Note : Currently docker has to be run under sudo, but there are ways to avoid this (by creating a docker user group and adding the current user to the group) and a init script for this could be created for this which has to be run just once.
sudo docker-compose up -d
sudo docker exec -w / -it ns-3-dce ./setup
sudo docker exec -w /bake/source/ns-3-dce -it ns-3-dce ./waf --run dce-linux-simple
To stop the docker instance
sudo docker-compose down
- I also worked on creating fixes for the circleci interface of ns-3-dce and created a PR for the same too. I'll keep working on it based on Matt Sir's suggestions : https://github.com/direct-code-execution/ns-3-dce/pull/115