From Nsnam
Revision as of 21:32, 29 August 2019 by Liangcheng-yu (Talk | contribs) (Project Summary Report)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

Return to GSoC 2019 Projects page.

Project Overview

  • Project name: Framework of Studying Flow Completion Time Minimization for Data Center Networks
  • Abstract: This project aims to make NS-3 more friendly to researchers working on the contemporary research topic in Data Center Networks (DCN) to use NS-3 as an effective simulation tool to evaluate their ideas. The theme of the project is to augment NS-3 with further support of Data Center Network (DCN) related simulation and research, with a special focus in the flow-based performance optimization, e.g., implementing useful modules in NS-3 ecosystem including Multi Level Feedback Queue scheduling, spine-leaf topology helper, flow statistics analysis helper and so forth.
  • About Me: I will soon join the University of Pennsylvania as a first-year Ph.D. student focusing on Computer Systems and Networking. I obtained my master degree in Wireless Systems at KTH Royal Institute of Technology, Sweden and Bachelor of Engineering in Automatic Control at Zhejiang University, China.

Technical Approach

The big picture of the project is to augment DCN support for NS-3. This project specifically looks at supporting flow-based performance optimization in NS-3 which involves multiple aspects including packet scheduling, packet tagging, topology helper, performance evaluation helper, load balancing and so forth. The Multi-Level Feedback Queue scheduling disciplines and Shortest Job First are used to support minimizing Flow Completion Time (FCT) with or without the information of the flow size. The spine-leaf topology is often used in the DCN simulation to evaluate the DCN proposal and metrics such as the average FCT, 99th percentile are the indicators to analyze the potential DCN proposal that involves novel scheduling, congestion control, routing, or load balancing approaches for DCN. The implementation will align with the classical and state-of-the-art DCN research via incorporating these necessary components for DCN research.

Milestones and Deliverables

This project aims to make NS-3 more friendly to researchers working on the contemporary research topic in DCN to use NS-3 as an effective simulation tool to evaluate their ideas so we would adjust the deliverables to best serve the purposes. The entire GSoC period is divided into 3 phases. For now, the deliverables at the end of each phase are as mentioned below:

Phase 1

  • MLFQ queue disc to support MLFQ scheduling with related examples, tests, documentation.
  • Corresponding Packet Tag and packet filter for PrioQueueDisc to support MLFQ.
  • Leaf-spine DCN topology helper to help quickly set up the DCN environment and corresponding documentation.
  • NetAim support for the topology helper and the example.

Phase 2

  • Implement the per-flow ECMP [RFC 2992] with the corresponding example, documentation.
  • Implement the flowlet-switching load balancing algorithm with the corresponding example, documentation.
  • Implement Shortest Job First scheduling.

Phase 3

  • Recycle BCube, FatTree and add the implementation of DCell topology.
  • Example program(s) for data center networking simulation in NS-3, and create a comprehensive dcn simulation example combining congestion control, load balancing, scheduling for DCN aligning with a representative work (Sigcomm/NSDI) in this domain.

Weekly Plan

Below is the weekly plan including the core technical details.

Week1 May 27 - June 2

  • Implement core functions of Multi-Level Feedback Queue (MLFQ) as a new queueing discipline for traffic control layer src/traffic-control/model/mlfq-queue-disc.{h,cc}.
  • Introduction of MLFQ functionality:
    • MLFQ is installed at network end hosts to mimic Shortest-Job-First (the flows with smaller flow size in bytes will be prioritized over those with larger flow size) in order to minimize the Flow Completion Time of the Network. The main configurations include the number of priorities K (max 16), K-1 thresholds (in bytes) separating different priorities,
    • MLFQ end host would maintain a hash table that would map the flow tuple (source IP, destination IP, source port, destination port, protocol) to flow priority, and another table to map it to the transmitted bytes.
    • When the packet is forwarded from the IP Layer to MLFQ (Traffic Control Layer), it will check the flow table and increment the transmitted bytes (if the flow does not exist, it will initialize the flow element with 0 bytes and top priority 0).
    • If the historically transmitted bytes of the flow exceed the configured threshold for the next lower priority, the current priority would be changed to the lower priority (e.g., priority 0 to priority 1). Such information would be tagged on the packet (e.g., ToS field/DSCP field). The packet will be then forwared to one of the K FIFO queues based on the priority for transmission.
    • Network switches will incorporate K priority queue (e.g., existing PrioQueueDisc) and upon receiving a packet, enqueue the packet based on the priority tagged on the packet.
  • Discussions
    • MLFQ (at network edges) will perform scheduling over the egress direction (during the transmission of the packet, IP=>TrafficControl=>NetDevice). For network switches, the ingress packet (NetDevice=>TrafficControl=>IP) will first down-to-up and then top-down passing through PrioQueue before transmission (IP=>TrafficControl=>NetDevice). I would like to make sure if TrafficControl Layer only performs scheduling for the egress direction?
    • How to tag priority information for the packet?
      • One way is to use the custom PacketTag subclass from Tag and insert that PacketTag with the priority information (max 16, so uint8_t) to the packet. Meanwhile, such tag would not be lost when passing the tagged packet from Traffic Control Layer to NetDevice (which might conduct fragmentation based on MTU). We would require some changes of the existing PrioQueue (src/traffic-control/model/prio-queue-disc.{cc,h}) on the DoEnqueue method, which needs to optionally extract that priority info and classify the packet to the corresponding FIFO queue.
      • Ideally, I feel it would be better if we directly tag the priority info to the packet header bytes (e.g., ToS/DSCP field)? Any suggestions?
    • On the maintenance of the flow hash table:
      • We plan to extract the flow tuple (source IP, destination IP, source port, destination port, protocol) from the packet and hash the information to identify the flow that the incoming packet belongs to when MLFQ performs scheduling. Any advice on the best practice?

Week2 June 3- June 9

  • Add test cases for MLFQ scheduling.
  • Add examples of using MLFQ scheduling with simple network topologies and compare the mean flow completion time with other scheduling policies in the example.

Week3 June 10- June 16

  • Implement the spine leaf topology helper to help set up the DCN environment and the corresponding animation.

Week4 June 17- June 23

  • Refine the example for leaf-spine topology helper.
  • Implement the FlowSink receiver application.
  • Start initial implementation for load balancing technique such as the flow base ECMP.

Week5 June 24- June 30

  • Implement the SJF scheduling.
  • Improving MLFQ scheduling implementation, tests, examples, doc based on the feedbacks.

Week6 July 1 - July 7

  • Finalizing the scheduling components for this project.

Week7 July 8 - July 14

  • The full implementation for flow based ECMP and related doc would be completed.

Week8 July 15 - July 21

  • Fix the remaining issues with the merge request of scheduling components (SjfQueueDisc and MlfqQueueDisc).
  • Finalize the implementation and doc for the load-balancing section.

Week9 July22 - July 28

  • Finish the MR for the topology helpers in the new data-center module.

Week10 July 29 - August 4

  • Create a DCN simulation example that covers a typical DCN simulation practice.
  • Fix the comments from the data-center module

Week11 August 5 - August 11

  • Fix issues from the DCN simulation example.
  • Pack up the topology helpers and the full DCN simulation example as a MR.

Week12 August 12 - August 19

  • Recap the project and fix the comments of the MRs.
  • Complete the final evaluation.

Weekly Progress

Below is the weekly progress including the core technical details.

Community Bonding Period

  • Prepared wiki page for the project and the introduction text to the NS-3 developer mailing list.
  • Met with mentors and discussed the details of the proposal.
    • Consider the trace, variables to be exposed to the users (e.g., TCP window size, queueing delay...)
    • Check Fqcodel for similar hashing implementation.
    • Completed the full cycle of the NS-3 feature before moving to the next (implementation, text, example, documentation)
  • Got familiar with the previous DCN project DCTCP (GSOC2017).
  • Forked the ns3-dev repo (sync with commit eda73a35) over GitLab and created a sample branch for review.

Week1 May 27 - June 2

  • Implemented Multi-Level Feedback Queue scheduling with mlfq-queue-disc.{cc,h} as a new class of the TC layer.
  • Added flow-prio-tag.{cc,h} to store the priority information as a PacketTag in the packets and flow-prio-packet-filter.{cc,h} to classify packets with the tag to the corresponding FIFO queue.

Week2 June 3 - June 9

  • Added mlfq rst documentation, validations tests and user examples.

Week3 June 10 - June 16

  • Added the helper for leaf-spine topology commonly used in data center networking and the example with animation.

Week4 June 17 - June 23

  • Completed the leaf spine topology with the documentation and the enhanced example

Week5 June 24 - June 30

  • Implemented the SJF scheduling and gradually improved the previous MLFQ scheduling implementation based on the feedbacks.
  • Discussed about the updated plan for phase 2&3.

Week6 July 1 - June 7

  • Last week the SJF scheduling tests, examples, and doc are completed.
  • The previous leaf spine helper is also migrated to the new module data-center.
  • The initial implementation for flow based ECMP support is done roughly.

Week7 July 8 - June 14

  • Extended the Ipv4GlobalRouting with flow based ECMP and flowlet switching which are often used in DCN for load balancing.

Week8 July 15 - June 21

  • Fixed the previous DCN scheduling implementation for the MR
  • Prepared the MR for extending the existing per-packet ECMP with common load balancing algorithms
  • Improved the topology helpers from the previous DCTCP project

Week9 July 22 - June 28

  • Fixed the issue of calculating header bytes and the comments from MR #75.
  • The full commit to recycle the Bcube helper and Fat Tree into the data-center module is completed.

Week10 July 29 - August 4

  • Created a DCN simulation example that covers a typical DCN simulation practice.
  • Implemented the DCell helper.

Week11 August 5 - August 11

  • Fixed issues of the DCell implementation.
  • Packed up the topology helpers and the full DCN simulation example as a MR.

Week12 August 12 - August 19

  • Fixed the data-center module.
  • Completed the final evaluation.

Project Summary Report

This project looks at supporting data center networking (DCN) simulation with NS-3. It will allow the users to evaluate DCN proposals with NS-3 more effectively. In the project I have added the scheduling algorithms (Shortest Job First and Multi-Level Feedback Queue), the common data center networking topology helpers, and extending NS-3 for common load balancing options, such as per-flow ECMP and flowlet switching.

The project could be therefore summarized into three dimentions:

1. DCN scheduling: This includes the ideal Shortest Job First algorithm in SjfQueueDisc and the realistic Multi-Level Feedback Queue in MlfqQueueDisc. Both scheduling algorithms could be applied to prioritize short flows and therefore optimize the average flow completion time in DCN, which is one of the core metrics to evaluate the DCN system design. (

2. Load balancing options: This section extends the existing NS-3 option per packet ECMP to another two path selection algorithms, namely flow based ECMP and simple flowlet switching. This would allow the packets to be distributed not only based on a per-packet manner, but also at the level of flow entity or flowlet entity. (

3. Environment helpers: DCN is a cluster of switches and servers connected based on a specific pattern. The project recycled the BCube and FatTree helpers from the previous DCTCP project and created a new data-center module with the implementation of LeafSpine and DCell. Both are common topologies in DCN. (

The commits and code patches could be found at GitLab:, specifically dcn, scheduling, load-balancing branches.