Bug 1009 – Tx Sequence number assert fail in TCP

Bug 1009 - Tx Sequence number assert fail in TCP


Summary:	Tx Sequence number assert fail in TCP

Status:	RESOLVED FIXED

Product:	ns-3
Classification:	Unclassified
Component:	internet
Version:	ns-3.9
Hardware:	Mac Intel All

Importance:	P5 normal
Assigned To:	ns-bugs

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2010-10-18 09:59 EDT by Bill Roome
Modified:	2010-11-17 19:23 EST (History)
CC List:	1 user (show)

See Also:

Attachments
patch to test (1.16 KB, patch) 2010-10-24 00:50 EDT, Tom Henderson	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Bill Roome 2010-10-18 09:59:22 EDT

When 100's of TCP connections come and go on a slow network, I occasionally get this assert to fail in tcp_socket_impl.cc:

bool TcpSocketImpl::ProcessPacketAction (...)
{
  .....
    case NEW_ACK:
      .....
        NS_ASSERT(tcpHeader.GetAckNumber ()  <= m_nextTxSequence);

That's line 1024 in 3.9 released.

It seems like the problem is in the following code, which sets m_nextTxSequence to the sequence number saved in m_finSequence:

bool TcpSocketImpl::ProcessPacketAction (...)
{
  ....
    case ACK_TX:
      NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX");
      if(tcpHeader.GetFlags() & TcpHeader::FIN)
      {
        ++m_nextRxSequence; //bump this to account for the FIN
        m_nextTxSequence = m_finSequence;
      }
      SendEmptyPacket (TcpHeader::ACK);
      break;

That's lines 945-953 in 3.9 released.

There seem to be two problems with that. First, very often m_finSequence is 0, indicating that it had never been set. Second, m_finSequence may be less than the current m_nextTxSequence. Often it's only 1 less, which seems to be okay. However, once in a while it's 100's of packets less than m_nextTxSequence.

I worked around the problem by making two changes to that block. First, if m_finSequence is 0, I just ignore it -- I don't set m_nextTxSequence. Second, if m_finSequence is less than m_nextTxSequence, I also ignore it.

Eg, I changed that to:

    case ACK_TX:
      NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX");
      if(tcpHeader.GetFlags() & TcpHeader::FIN)
      {
        ++m_nextRxSequence; //bump this to account for the FIN
        if (m_finSequence == g_zeroSeqn)
          {
            NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN finSeqn==0 ERROR:"
                         << " m_nextTxSequence=" << m_nextTxSequence);
          }
        else if (m_finSequence < m_nextTxSequence)
          {
            NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN SEQUENCE ERROR:"
                         << " m_finSequence=" << m_finSequence
                         << " m_nextTxSequence=" << m_nextTxSequence);
          }
        else
          {
            m_nextTxSequence = m_finSequence;
          }
      }
      SendEmptyPacket (TcpHeader::ACK);
      break;

With those changes, everything works okay.

I realize this might not be the best solution; I'll leave it to someone who's more familiar with this code and TCP to decide whether this cures the disease or just treats the symptoms.

Of course, if you decide it's curing the disease, it would be better to add an explicit "m_finSequenceHasBeenSet" flag rather than assuming that 0 is always an invalid sequence number.

Comment 1 Bill Roome 2010-10-18 10:02:23 EDT

When 100's of TCP connections are created and closed on a slow network, occasionally this assert fails (tcp_socket_impl.cc, line 1024, 3.9 released):

bool TcpSocketImpl::ProcessPacketAction (...)
{
  .....
    case NEW_ACK:
      .....
        NS_ASSERT(tcpHeader.GetAckNumber ()  <= m_nextTxSequence);

It seems like the problem is in the following code, which sets m_nextTxSequence to the sequence number saved in m_finSequence (lines 945-953 in 3.9 released):

bool TcpSocketImpl::ProcessPacketAction (...)
{
  ....
    case ACK_TX:
      NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX");
      if(tcpHeader.GetFlags() & TcpHeader::FIN)
      {
        ++m_nextRxSequence; //bump this to account for the FIN
        m_nextTxSequence = m_finSequence;
      }
      SendEmptyPacket (TcpHeader::ACK);
      break;

There seem to be two problems with that. First, very often m_finSequence is 0, indicating that it had never been set. Second, sometimes m_finSequence is less than the current m_nextTxSequence. Often it's only 1 less, which seems to be okay. However, once in a while it's 100's of packets less than m_nextTxSequence.

I worked around the problem by making two changes to that block. First, if m_finSequence is 0, I just ignore it -- I don't set m_nextTxSequence. Second, if m_finSequence is less than m_nextTxSequence, I also ignore it.

Eg, I changed that to:

    case ACK_TX:
      NS_LOG_LOGIC ("TcpSocketImpl " << this <<" Action ACK_TX");
      if(tcpHeader.GetFlags() & TcpHeader::FIN)
      {
        ++m_nextRxSequence; //bump this to account for the FIN
        if (m_finSequence == g_zeroSeqn)
          {
            NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN finSeqn==0 ERROR:"
                         << " m_nextTxSequence=" << m_nextTxSequence);
          }
        else if (m_finSequence < m_nextTxSequence)
          {
            NS_LOG_ERROR("TcpSocketImpl::ProcessPacketAction: ACK_TX+FIN SEQUENCE ERROR:"
                         << " m_finSequence=" << m_finSequence
                         << " m_nextTxSequence=" << m_nextTxSequence);
          }
        else
          {
            m_nextTxSequence = m_finSequence;
          }
      }
      SendEmptyPacket (TcpHeader::ACK);
      break;

With those changes, everything works okay.

I realize this might not be the best solution; I'll leave it to someone who's more familiar with this code and TCP to decide whether this cures the disease or just treats the symptoms.

Of course, if you decide it's curing the disease, it would be better to add an explicit "m_finSequenceHasBeenSet" flag rather than assuming that 0 is always an invalid sequence number.

Comment 2 Bill Roome 2010-10-18 10:08:28 EDT

Opps -- my comment #1 is just a restatement of my original description of the problem. When I submitted the first original description, the bug tracker complained that my session had timed out, so I thought it had lost the first version.

I'd delete comment #1, but I don't see any way to do that.

Comment 3 Tom Henderson 2010-10-24 00:49:56 EDT

I looked at this and m_finSequence seems to be trying to track different things at different places in the code (the transmit sequence number of the FIN, or the received sequence number of the FIN).

Does the patch attached fix the problem for you?  It tries to remove the cases in which m_finSequence has any relation to transmit sequence numbers.

Comment 4 Tom Henderson 2010-10-24 00:50:23 EDT

Created attachment 1004 [details]
patch to test

Comment 5 Bill Roome 2010-10-25 11:03:39 EDT

Yes, that patch fixes the problem; my tests run to completion. Thank you!

Comment 6 Tom Henderson 2010-11-17 19:23:55 EST

pushed in changeset 46e44607166f