Bug 1072 - crash upon call to Packet::AddAtEnd of fragmented packets
crash upon call to Packet::AddAtEnd of fragmented packets
Status: VERIFIED FIXED
Product: ns-3
Classification: Unclassified
Component: core
ns-3-dev
All All
: P5 normal
Assigned To: Mathieu Lacage
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-15 04:19 EDT by Mathieu Lacage
Modified: 2012-08-15 12:17 EDT (History)
5 users (show)

See Also:


Attachments
trigger assert+fix (5.73 KB, patch)
2011-03-15 04:19 EDT, Mathieu Lacage
Details | Diff
method PacketMetadata::AddAtEnd() with NS_LOG_INFO statements (2.45 KB, text/x-c++src)
2011-03-31 10:24 EDT, Marc Reichelt
Details
Last log output of last call to AddAtEnd() - level INFO (420 bytes, text/x-log)
2011-03-31 10:26 EDT, Marc Reichelt
Details
Last log output of last call to AddAtEnd() - level ALL (811.87 KB, text/x-log)
2011-03-31 10:27 EDT, Marc Reichelt
Details
New log output of last call to AddAtEnd() (with logs of m_used value) (934.54 KB, text/x-log)
2011-04-05 14:30 EDT, Marc Reichelt
Details
Extremely simple testcase (3.94 KB, application/octet-stream)
2011-04-18 20:38 EDT, Tommaso Pecorella
Details
reduced testcase (1.62 KB, patch)
2011-04-21 03:26 EDT, Mathieu Lacage
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mathieu Lacage 2011-03-15 04:19:38 EDT
Created attachment 1041 [details]
trigger assert+fix

It is hard to describe the exact situation that triggers this crash, but the attached patch contains a reproducible testcase that triggers an assert together with the fix to fix this testcase.

originally reported by mcreichelt@googlemail.com
Comment 1 Mathieu Lacage 2011-03-15 04:27:48 EDT
changeset: 0d0525001175
Comment 2 Marc Reichelt 2011-03-31 10:23:28 EDT
I reopened this bug because the problems with PacketMetadata still occur - just after using packet fragmentation / appending packets to a buffer packet more often. The 3 files attached should provide more information: One contains the AddAtEnd method as I see it (with the NS_LOG_INFO commands added). One contains the log of INFO level and the other one level ALL for the very last call of PacketMetadata::AddAtEnd().
Comment 3 Marc Reichelt 2011-03-31 10:24:55 EDT
Created attachment 1056 [details]
method PacketMetadata::AddAtEnd() with NS_LOG_INFO statements
Comment 4 Marc Reichelt 2011-03-31 10:26:29 EDT
Created attachment 1057 [details]
Last log output of last call to AddAtEnd() - level INFO
Comment 5 Marc Reichelt 2011-03-31 10:27:22 EDT
Created attachment 1058 [details]
Last log output of last call to AddAtEnd() - level ALL
Comment 6 Mathieu Lacage 2011-04-01 03:11:31 EDT
What I would like to see is log output that shows which piece of code is setting m_used to an invalid value. i.e., look at the method AddAtEnd and the functions it calls, identify which pieces of code set m_used, and then, add logging statements there to see which of them is causing the problem. 

Basically, you have all the information you need: the failing assert is telling you what is causing the problem and you need to figure out which piece of code is setting the incorrect state.
Comment 7 Marc Reichelt 2011-04-05 14:30:36 EDT
Created attachment 1061 [details]
New log output of last call to AddAtEnd() (with logs of m_used value)

I added the log output including the value changes of m_used.
But I still can not figure out which code is responsible for the problem.
However, I noticed that there seems to be an integer overflow of m_used (for example here on lines 5875 to 5888):

35.7923s PacketMetadata:AddBig(0x8f59938, 65535, 65490, 21913, 21947, 1, 2048, 1679, 576, 760, 510)
35.7923s PacketMetadata:Create(): create size=65524, max=65507
35.7923s PacketMetadata:Create(): create alloc size=65524
35.7923s PacketMetadata:Recycle(): recycle size=65507, list=0
35.7923s PacketMetadata:UpdateTail(0x8f59938, 17)
35.7923s PacketMetadata:UpdateTail(): m_used += written; (value: 65524)                     // value of m_used is 65524
35.7923s PacketMetadata:ReadItems(0x8f62088, 21913)
35.7923s PacketMetadata:AddBig(0x8f59938, 65535, 65507, 21896, 21930, 1, 2048, 1679, 744, 786, 510)
35.7923s PacketMetadata:Create(): create size=65541, max=65524
35.7923s PacketMetadata:Create(): create alloc size=65541
35.7923s PacketMetadata:Recycle(): recycle size=65524, list=0
35.7923s PacketMetadata:UpdateTail(0x8f59938, 17)
35.7923s PacketMetadata:UpdateTail(): m_used += written; (value: 5)                         // whoops?

Is this intended? This bug really drives me crazy - and I can not start my simulation for my diploma thesis until it is fixed... :-/
Moreover, I do not understand why the packet metadata grows so heavily in the first place - because I always remove the packets from the buffer again.

If you want the full log I can send it to you via mail - it is 42 MB uncompressed (1,7 MB compressed).
Comment 8 Tommaso Pecorella 2011-04-18 20:38:21 EDT
Created attachment 1084 [details]
Extremely simple testcase

I think this is the very same bug I found, and it is actually stopping the IPv4 fragmentation (and it's bugging the IPv6 fragmentation as well I think).

I'd raise its priority if possible.

I did some checks, but I found myself lost in the deeps of buffers and such. I'll try to dig more for a patch, but I'm not that confident I'll be able to fix this.

The testcase I uploaded is a damn simple one:
1) get a packet
2) add an header
3) split the packet in two
4) add EthernetHeader & EthernetTrailer to fragments
5) remove the Eth headers and trailers
6) use AddAtEnd to combine the packets.
7) watch the crash.

If you skip steps 4 and 5 everything works as intended. Pity that packets have to have headers and trailers added and removed...

My best guess is that during the AddAtEnd the metadata aren't rebuilt correctly. Shoot me if I have a clue about how to fix it.

T.
Comment 9 Tommaso Pecorella 2011-04-19 13:32:21 EDT
I found the bugged code, but I can't figure out how to fix it (I'm too sleepy after the debugging).

Anyway, the bug is in a code optimization, and here we could state something about code optimization and bugs made faster.

In packet-metadata.cc, function PacketMetadata::AddAtEnd (PacketMetadata const&o), lines 826 and sgg.:


  if (extraItem.packetUid == tailExtraItem.packetUid &&
      item.typeUid == tailItem.typeUid &&
      item.chunkUid == tailItem.chunkUid &&
      item.size == tailItem.size &&
      extraItem.fragmentStart == tailExtraItem.fragmentEnd)
    {
      /* If the previous tail came from the same header as
       * the next item we want to append to our array, then, 
       * we merge them and attempt to reuse the previous tail's
       * location.
       */
      tailExtraItem.fragmentEnd = extraItem.fragmentEnd;
      ReplaceTail (&tailItem, &tailExtraItem, tailSize);
      current = item.next;
    }

NOW: this optimization is surely nice, but it kicks in only in a very small number of cases.
The bug might as well be in PacketMetadata::ReplaceTail(), as it's only called from the above code.

Anyway, killing the optimization fixes the bug, at least for me.

The path to end this bug (I hope) is then rather simple:
1) remove the optimization, or
2) keep digging and find where in the "optimization" is the real bug.

I'd really like the second one, as the bug might be in another point and come again later on. On the other hand probably Mathieu could find it faster than me.

Cheers,

T.
Comment 10 Tommaso Pecorella 2011-04-19 21:11:16 EDT
Last report, now we ned Mathieu for real, as this is his code and I can't do anything but a bad patch.

It seems that an "alternative" way of fixing the issue is to remove the last block when the "optimization" kicks in.

As is:
-----

  else
    {
      current = o.m_head;
    }

  /* Now that we have merged our current tail with the head of the
   * next packet, we just append all items from the next packet
   * to the current packet.
   */

  while (current != 0xffff)
    {
      o.ReadItems (current, &item, &extraItem);
      uint16_t written = AddBig (0xffff, m_tail, &item, &extraItem);
      UpdateTail (written);
      if (current == o.m_tail)
        {
          break;
        }
      current = item.next;
    }
----

should be:
----

  else
    {
      current = o.m_head;
      /* Now that we have merged our current tail with the head of the
       * next packet, we just append all items from the next packet
       * to the current packet.
       */

      while (current != 0xffff)
        {
          o.ReadItems (current, &item, &extraItem);
          uint16_t written = AddBig (0xffff, m_tail, &item, &extraItem);
          UpdateTail (written);
          if (current == o.m_tail)
            {
              break;
            }
          current = item.next;
        }
    }
-----

Problem is: I can't be totally sure about it.
Doing that remove MY bug and the code I posted works as intended. On the other my explanation about why this solution is working is more magic that anything else:
"extraItem" has been already copied during the previous tail joining, so adding it again at the end of the metadata is wrong.

Of course the comment in the code is not anymore meaningful, and should be removed.

Mathieu, we need your wisdom here. I am totally wrong ? I killed a bug or I just added two more ?

Cheers,

T.
Comment 11 Mathieu Lacage 2011-04-21 03:26:31 EDT
Created attachment 1087 [details]
reduced testcase

this testcase indeed fails.
Comment 12 Mathieu Lacage 2011-04-21 07:42:14 EDT
changeset 90bb55906918
Comment 13 Tommaso Pecorella 2011-04-21 13:28:52 EDT
Small, simple and elegant. /bow
Comment 14 Cristian 2012-08-15 12:17:05 EDT
Hi all,

this bug seams not to have disappeared completely.
I'm using ns3.10 with the revised code which was supposed to fix this bug.

In my simulations I'm using the packet aggregation feature which makes the simulation crash a lot more than the case when no packet aggregation is used.

I've run my code under gdb and got the following:

assert failed. cond="IsStateOk ()", file=../src/common/packet-metadata.cc, line=866
terminate called without an active exception

Program received signal SIGABRT, Aborted.
0x00007ffff4adb445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff4adb445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff4adebab in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff542969d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff5427846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff5427873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6d0927f in ns3::PacketMetadata::AddAtEnd (this=0xb8a910, o=...) at ../src/common/packet-metadata.cc:866
#6  0x00007ffff6d2a4a6 in ns3::Packet::AddAtEnd (this=0xb8a8d0, packet=...) at ../src/common/packet.cc:329
#7  0x00007ffff72b731a in ns3::MsduStandardAggregator::Aggregate (this=0xb77370, packet=..., aggregatedPacket=..., src=...,
    dest=...) at ../src/devices/wifi/msdu-standard-aggregator.cc:78
#8  0x00007ffff72a6917 in ns3::EdcaTxopN::NotifyAccessGranted (this=0x747430) at ../src/devices/wifi/edca-txop-n.cc:562
#9  0x00007ffff72b121e in ns3::EdcaTxopN::Dcf::DoNotifyAccessGranted (this=0x7475d0) at ../src/devices/wifi/edca-txop-n.cc:84
#10 0x00007ffff7231483 in ns3::DcfState::NotifyAccessGranted (this=0x7475d0) at ../src/devices/wifi/dcf-manager.cc:145
#11 0x00007ffff723284c in ns3::DcfManager::DoGrantAccess (this=0x746f70) at ../src/devices/wifi/dcf-manager.cc:434
#12 0x00007ffff723297a in ns3::DcfManager::AccessTimeout (this=0x746f70) at ../src/devices/wifi/dcf-manager.cc:450
#13 0x00007ffff72352ee in ns3::EventMemberImpl0::Notify (this=0xa65670) at debug/ns3/make-event.h:94
#14 0x00007ffff6c88ffc in ns3::EventImpl::Invoke (this=0xa65670) at ../src/simulator/event-impl.cc:37
#15 0x00007ffff6ca6146 in ns3::DefaultSimulatorImpl::ProcessOneEvent (this=0x6d8580)
    at ../src/simulator/default-simulator-impl.cc:128
#16 0x00007ffff6ca62f6 in ns3::DefaultSimulatorImpl::Run (this=0x6d8580) at ../src/simulator/default-simulator-impl.cc:158
#17 0x00007ffff6c89add in ns3::Simulator::Run () at ../src/simulator/simulator.cc:173
#18 0x00000000004298c2 in main (argc=5, argv=0x7fffffffe438) at ../scratch/mesh_voip_fairness_4.cc:1428
(gdb) up
#1  0x00007ffff4adebab in abort () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) up
#2  0x00007ffff542969d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) up
#3  0x00007ffff5427846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) up
#4  0x00007ffff5427873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) up
#5  0x00007ffff6d0927f in ns3::PacketMetadata::AddAtEnd (this=0xb8a910, o=...) at ../src/common/packet-metadata.cc:866
866      NS_ASSERT (IsStateOk ());
(gdb) info local
__FUNCTION__ = "AddAtEnd"
tailExtraItem = {fragmentStart = 0, fragmentEnd = 20, packetUid = 264998}
current = 63091
extraItem = {fragmentStart = 0, fragmentEnd = 20, packetUid = 265723}
tailItem = {next = 65535, prev = 65426, typeUid = 1, size = 20, chunkUid = 7824}
tailSize = 14
item = {next = 65535, prev = 63076, typeUid = 1, size = 20, chunkUid = 12033}
(gdb) info args
this = 0xb8a910
o = @0x92f2b0: {
  static m_freeList = {<std::vector<ns3::PacketMetadata::Data*, std::allocator<ns3::PacketMetadata::Data*> >> = {<std::_Vector_base<ns3::PacketMetadata::Data*, std::allocator<ns3::PacketMetadata::Data*> >> = {
        _M_impl = {<std::allocator<ns3::PacketMetadata::Data*>> = {<__gnu_cxx::new_allocator<ns3::PacketMetadata::Data*>> = {<No data fields>}, <No data fields>}, _M_start = 0xbb6270, _M_finish = 0xbb6270, _M_end_of_storage =
    0xbb6a70}}, <No data fields>}, <No data fields>}, static m_enable = true, static m_enableChecking = false,
  static m_metadataSkipped = false, static m_maxSize = 65544, static m_chunkUid = 62671, m_data = 0x1636a50, m_head = 64289,
  m_tail = 63091, m_used = 64298, m_packetUid = 264629}

Please let me and the ns3 community know if you spot something which may lead to another bugfix for this PacketMetadata related issue.

Many thanks,
Cristian

PS: this has been posted on the NS3 Google Group too, to reach out for more interested/affected people.