Difference between revisions of "New TCP Socket Architecture"

From Nsnam
Jump to: navigation, search
(New Structure)
Line 45: Line 45:
  
 
The TcpTxBuffer is the transmit (Tx) buffer for TCP. The upper layer application sends data to TcpSocketBase. The data is then ''appended'' to the TcpTxBuffer by TcpTxBuffer::Add() call. Similar to TcpRxBuffer, it can also set the maximum buffer size by TcpTxBuffer::SetMaxBufferSize(). Appending data will fail if the buffer is going to store more data than its maximum size. Because appending to TcpTxBuffer is supposed to be sequential, without overlap, TcpTxBuffer::Add() call is merely put the data into the end of a list. TcpTxBuffer, however, support extracting data from anywhere in the buffer. This is done by TcpTxBuffer::CopyFromSeq() call.
 
The TcpTxBuffer is the transmit (Tx) buffer for TCP. The upper layer application sends data to TcpSocketBase. The data is then ''appended'' to the TcpTxBuffer by TcpTxBuffer::Add() call. Similar to TcpRxBuffer, it can also set the maximum buffer size by TcpTxBuffer::SetMaxBufferSize(). Appending data will fail if the buffer is going to store more data than its maximum size. Because appending to TcpTxBuffer is supposed to be sequential, without overlap, TcpTxBuffer::Add() call is merely put the data into the end of a list. TcpTxBuffer, however, support extracting data from anywhere in the buffer. This is done by TcpTxBuffer::CopyFromSeq() call.
 +
 +
== Pluggable Congestion Control in Linux TCP ==
 +
In linux/include/tcp.h:
 +
  struct tcp_congestion_ops {
 +
        struct list_head        list;
 +
        unsigned long flags;
 +
 
 +
        /* initialize private data (optional) */
 +
        void (*init)(struct sock *sk);
 +
        /* cleanup private data  (optional) */
 +
        void (*release)(struct sock *sk);
 +
 
 +
        /* return slow start threshold (required) */
 +
        u32 (*ssthresh)(struct sock *sk);
 +
        /* lower bound for congestion window (optional) */
 +
        u32 (*min_cwnd)(const struct sock *sk);
 +
        /* do new cwnd calculation (required) */
 +
        void (*cong_avoid)(struct sock *sk, u32 ack, u32 in_flight);
 +
        /* call before changing ca_state (optional) */
 +
        void (*set_state)(struct sock *sk, u8 new_state);
 +
        /* call when cwnd event occurs (optional) */
 +
        void (*cwnd_event)(struct sock *sk, enum tcp_ca_event ev);
 +
        /* new value of cwnd after loss (optional) */
 +
        u32  (*undo_cwnd)(struct sock *sk);
 +
        /* hook for packet ack accounting (optional) */
 +
        void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
 +
        /* get info for inet_diag (optional) */
 +
        void (*get_info)(struct sock *sk, u32 ext, struct sk_buff *skb);
 +
 
 +
        char            name[TCP_CA_NAME_MAX];
 +
        struct module  *owner;
 +
  };
  
 
''To be continued''
 
''To be continued''

Revision as of 21:25, 31 May 2010

In the following, a new architecture for TCP socket implementation is proposed. This is to replace the old TcpSocketImpl class in NS-3.8 so that different favors of TCP can be easily implemented.

The current working directory is located at http://code.nsnam.org/adrian/ns-3-tcp

Old Structure

As of change set 6273:8d70de29d514 in the Mercurial, TCP simulation is implemented by class TcpSocketImpl, in src/internet-stack/tcp-socket-impl.h and src/internet-stack/tcp-socket-impl.cc. The TcpSocketImpl class is implementing TCP NewReno, despite the Doxygen comment claims that is implementing Tahoe.

The TcpSocketImpl class is derived from TcpSocket class, which in turn, is derived from Socket class. The TcpSocket class is merely an empty class defining the interface for attribute get/set. Examples of the attributes configured by the interface of TcpSocket class are the send and receive buffer sizes, initial congestion window size, etc. The Socket class, however, provides the interface for the L7 application to call.

How to use TcpSocketImpl

TCP state machine transitions are defined in tcp-l4-protocol.h and tcp-l4-protocol.cc. The class TcpSocketImpl does not maintain the transition rule but keeps track on the state of the current socket.

When an application needs a TCP connection, it has to get a socket from TcpL4Protocol::CreateSocket(). This call will allocate a TcpSocketImpl object and configure it (e.g. assign it to a particular node). The TcpL4Protocol object is unique on a TCP/IP stack and serve as a mux/demux layer for the real sockets, namely, TcpSocketImpl.

Once the TcpSocketImpl object is created, it is in CLOSED state. The application can instruct it to Bind() to a port number and then Connect() or Listen() as the traditional BSD socket does.

The Bind() call is to register its port in the TcpL4Protocol object as an Ipv4EndPoint, and set up the callback functions (by way of FinishBind()), so that mux/demux can be done.

The Listen() call puts the socket into LISTEN state. The Connect() call, on the other hand, puts the socket in SYN_SENT state and initiates three way handshake.

The application can close the socket by calling the Close() call, which in turn, destroys the Ipv4EndPoint after the FIN packets.

Once the socket is ready to send, application invokes the Send() call in TcpSocketImpl. The receiver side application calls Recv() to get the packet data.

Inside TcpSocketImpl

The operation of TcpSockImpl is carried out in two parallel mechanisms. To communicate with higher level applications, the Send() and Recv() calls are dealing with the buffers directly. They append the data into the send buffer and retrieve data from the receive buffer respectively. To send and receive data over the network through lower layers, functions ProcessEvent(), ProcessAction(), and ProcessPacketAction() are called.

Two functions are crucial to trigger these three process functions. The function ForwardUp() is invoked when the lower layer (Ipv4L3Protocol) received a packet that destined to this TCP socket. The function SendPendingData() is invoked whenever the application has anything appended to the send buffer.

ForwardUp() converts the incoming packet's TCP flags into an event. Then, it updates the current state machine by calling ProcessEvent(), and perform the subsequent action with ProcessPacketAction(). ProcessEvent() handles only the connection set up and tear down. All other cases are handled by ProcessPacketAction() and ProcessAction(). The function ProcessPacketAction() handles those cases that need to reference the TCP header of packet, other cases are handed over to ProcessAction().

SendPendingData() manages the send window. When the send window is big enough to send a packet, it extracts data from the send buffer and package it with a TCP header, then pass it over the lower layers.

New Structure

The new structure, TcpSocketBase class, is having the same relationship to TcpSocket and Socket classes as TcpSocketImpl. However, instead of providing a concrete TCP implementation, it is designed to meet the following goals:

  • Provide only the function common to all TCP classes, namely, the implementation of TCP state machine
  • Minimize the code footprint and make it modular to make it easier to understand

From a lower-layer's point of view, TCP has not changed since 1980. The TCP state machine remained the same. The only different between different variants of TCP is on the congestion control and fairness distribution. The TcpSocketBase class keeps the state machine operation, i.e. ProcessEvent() and ProcessAction() calls, the same as TcpSocketImpl class. These functions, however, will be tidied up in the future.

In the current TcpSocketBase class, two auxiliary classes are used, namely, TcpRxBuffer and TcpTxBuffer.

The TcpRxBuffer is the receive (Rx) buffer for TCP. It accepts packet fragments at any position. Function call TcpRxBuffer::Add() inserts a packet into the Rx buffer. It obtains the sequence number of the data from the provided TcpHeader. The Rx buffer has a maximum buffer size, defaults to 32KiB, can be set by TcpRxBuffer::SetMaxBufferSize(). The sequence number of the head of the buffer can be set by TcpRxBuffer::SetNextRxSeq(). This is supposed to be called upon the connection is established so that it can report out-of-sequence packets. TcpRxBuffer handles all the reordering work so that TcpSocketBase can simply extract from it in a single call, TcpRxBuffer::Extract().

The TcpTxBuffer is the transmit (Tx) buffer for TCP. The upper layer application sends data to TcpSocketBase. The data is then appended to the TcpTxBuffer by TcpTxBuffer::Add() call. Similar to TcpRxBuffer, it can also set the maximum buffer size by TcpTxBuffer::SetMaxBufferSize(). Appending data will fail if the buffer is going to store more data than its maximum size. Because appending to TcpTxBuffer is supposed to be sequential, without overlap, TcpTxBuffer::Add() call is merely put the data into the end of a list. TcpTxBuffer, however, support extracting data from anywhere in the buffer. This is done by TcpTxBuffer::CopyFromSeq() call.

Pluggable Congestion Control in Linux TCP

In linux/include/tcp.h:

 struct tcp_congestion_ops {
       struct list_head        list;
       unsigned long flags;
 
       /* initialize private data (optional) */
       void (*init)(struct sock *sk);
       /* cleanup private data  (optional) */
       void (*release)(struct sock *sk);
 
       /* return slow start threshold (required) */
       u32 (*ssthresh)(struct sock *sk);
       /* lower bound for congestion window (optional) */
       u32 (*min_cwnd)(const struct sock *sk);
       /* do new cwnd calculation (required) */
       void (*cong_avoid)(struct sock *sk, u32 ack, u32 in_flight);
       /* call before changing ca_state (optional) */
       void (*set_state)(struct sock *sk, u8 new_state);
       /* call when cwnd event occurs (optional) */
       void (*cwnd_event)(struct sock *sk, enum tcp_ca_event ev);
       /* new value of cwnd after loss (optional) */
       u32  (*undo_cwnd)(struct sock *sk);
       /* hook for packet ack accounting (optional) */
       void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
       /* get info for inet_diag (optional) */
       void (*get_info)(struct sock *sk, u32 ext, struct sk_buff *skb);
 
       char            name[TCP_CA_NAME_MAX];
       struct module   *owner;
 };

To be continued