Bug 1751 – TCP should react to MTU changes

Bug 1751 - TCP should react to MTU changes


Summary:	TCP should react to MTU changes

Status:	CONFIRMED

Product:	ns-3
Classification:	Unclassified
Component:	internet
Version:	ns-3-dev
Hardware:	All All

Importance:	P5 enhancement
Assigned To:	natale.patriciello

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-08-12 00:57 EDT by Tommaso Pecorella
Modified:	2016-04-08 09:34 EDT (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tommaso Pecorella 2013-08-12 00:57:57 EDT

After the introduction of PathMTU in IPv6, it would be nice to have TCP auto-adjusting its Segment Size according to the MTU *and* to not consider as "loss" a packet discarded for excessive MTU (as per standard)

This can be done in various ways. In IPv6 it is safe to check the underlying MTU, in IPv4 it would be probably best to implement the L4 MTU check (there's an RFC describing it).

Comment 1 Tommaso Pecorella 2014-09-09 04:12:31 EDT

Packetization Layer Path MTU Discovery (PMTUD)
http://tools.ietf.org/html/rfc4821

Comment 2 natale.patriciello 2016-04-08 03:21:53 EDT

(In reply to Tommaso Pecorella from comment #1)
> Packetization Layer Path MTU Discovery (PMTUD)
> http://tools.ietf.org/html/rfc4821

I am going to read the RFC, some question before I start:

*) the MTU could change dynamically on a network (e.g. before was 1000, during a TCP connection it becomes 800) ?
*) If yes, a new layer of problem arises, I'd like to assume "no"
*) If no, TCP is being signaled or it need to actively check the MTU ? If the latter, how it can generally assume the header size of below layers (e.g. I sent down 800 bytes, but the packet has dropped since it exceeds the maximum MTU. An hypotetical GetMtu method returns 1000, and 1000 is passed due to the lower level's headers) ?

Comment 3 Tommaso Pecorella 2016-04-08 04:20:20 EDT

(In reply to natale.patriciello from comment #2)
> (In reply to Tommaso Pecorella from comment #1)
> > Packetization Layer Path MTU Discovery (PMTUD)
> > http://tools.ietf.org/html/rfc4821
> 
> I am going to read the RFC, some question before I start:
> 
> *) the MTU could change dynamically on a network (e.g. before was 1000,
> during a TCP connection it becomes 800) ?
> *) If yes, a new layer of problem arises, I'd like to assume "no"
> *) If no, TCP is being signaled or it need to actively check the MTU ? If
> the latter, how it can generally assume the header size of below layers
> (e.g. I sent down 800 bytes, but the packet has dropped since it exceeds the
> maximum MTU. An hypotetical GetMtu method returns 1000, and 1000 is passed
> due to the lower level's headers) ?

Unfortunately, the answer is yes, the PMTU can (and will) change during a connection.
The changes could be due to rerouting though a new path with different PMTU and/or to PMTU probing algorithms.
PMTU probing algorithms could momentarily increase the datagram size to probe if a rerouting over a path with a *larger* PMTU happened. As a consequence, what you'd see is at best a datagram lost, at worst an increase and a decrease of the PMTU. As a side note I'd consider the second effect an implementation error.

Comment 4 Tommaso Pecorella 2016-04-08 04:22:42 EDT

(In reply to natale.patriciello from comment #2)
> (In reply to Tommaso Pecorella from comment #1)
> > Packetization Layer Path MTU Discovery (PMTUD)
> > http://tools.ietf.org/html/rfc4821
> 
> I am going to read the RFC, some question before I start:
> 
> *) the MTU could change dynamically on a network (e.g. before was 1000,
> during a TCP connection it becomes 800) ?
> *) If yes, a new layer of problem arises, I'd like to assume "no"
> *) If no, TCP is being signaled or it need to actively check the MTU ? If
> the latter, how it can generally assume the header size of below layers
> (e.g. I sent down 800 bytes, but the packet has dropped since it exceeds the
> maximum MTU. An hypotetical GetMtu method returns 1000, and 1000 is passed
> due to the lower level's headers) ?

Forgot the second question.
ICMP too big will report the L2 MTU... header size is not taken into account by the router signalling the drop.

Comment 5 natale.patriciello 2016-04-08 09:29:57 EDT

Ok, so after reading the RFC one strategy could be pursued, and it consists in tackling the problem through splitting in two steps:

1) If a packet is discarded, but we receive ICMP too big message, it's not a congestion signal

2) Implement the discovery of MTU that can be enabled or disabled by the user: I imagine disabled by default, but that can be enabled by the user, selectively (which is the current Linux situation: /proc/sys/net/ipv4/tcp_mtu_probing (more informations: http://kb.pert.geant.net/PERTKB/PathMTU).


For what regards 1), I think we miss a "roll-off" feature, such as the one present in Linux. I mean, we receive three duplicate acks, the cwnd is halved, blabla, but then we receive a ICMP too big: we need to roll off the cwnd and ssth to the old value.

Comment 6 Tommaso Pecorella 2016-04-08 09:34:46 EDT

(In reply to natale.patriciello from comment #5)
> Ok, so after reading the RFC one strategy could be pursued, and it consists
> in tackling the problem through splitting in two steps:
> 
> 1) If a packet is discarded, but we receive ICMP too big message, it's not a
> congestion signal
> 
> 2) Implement the discovery of MTU that can be enabled or disabled by the
> user: I imagine disabled by default, but that can be enabled by the user,
> selectively (which is the current Linux situation:
> /proc/sys/net/ipv4/tcp_mtu_probing (more informations:
> http://kb.pert.geant.net/PERTKB/PathMTU).
> 
> 
> For what regards 1), I think we miss a "roll-off" feature, such as the one
> present in Linux. I mean, we receive three duplicate acks, the cwnd is
> halved, blabla, but then we receive a ICMP too big: we need to roll off the
> cwnd and ssth to the old value.

I agree. Moreover, by default IPv4 will fragment the packets. As a consequence we need to implement *also* the don't fragment option set at socket level. I'll take a look into this.

Just one note about the rolloff. The ICMP could arrive before or even in the middle of the 3 dupacks.