Bug 1770 - mesh test and example crash with gcc-4.8.1
mesh test and example crash with gcc-4.8.1
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: mesh
pre-release
PC Linux
: P3 normal
Assigned To: Daniel L.
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-11 12:44 EDT by Tom Henderson
Modified: 2015-01-14 18:00 EST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2013-10-11 12:44:58 EDT
I added Fedora Core 19 (32-bit) to the buildslaves, and one example and one test crash (both of them mesh) when building in optimized mode.  This hasn't been observed yet on any other compiler/platform.  It is present in ns-3-dev and ns-3.18.

It is not observed on Fedora 19 64-bit in optimized build, however.

Here is a backtrace:

./waf --command-template="gdb %s" --run mesh

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0xb7b3bd59 in ns3::WifiInformationElement::DeserializeIfPresent (
    this=this@entry=0xbfffeadc, i=...)
    at ../src/wifi/model/wifi-information-element.cc:70
70	  if (elementId != ElementId ())
Missing separate debuginfos, use: debuginfo-install glibc-2.17-18.fc19.i686 libgcc-4.8.1-1.fc19.i686 libstdc++-4.8.1-1.fc19.i686 libxml2-2.9.1-1.fc19.i686 xz-libs-5.1.2-4alpha.fc19.i686 zlib-1.2.7-10.fc19.i686
(gdb) p elementId
$1 = <optimized out>
(gdb) bt
#0  0xb7b3bd59 in ns3::WifiInformationElement::DeserializeIfPresent (
    this=this@entry=0xbfffeadc, i=...)
    at ../src/wifi/model/wifi-information-element.cc:70
#1  0xb7b91063 in ns3::MgtProbeResponseHeader::Deserialize (
    this=this@entry=0xbfffea60, start=...)
    at ../src/wifi/model/mgt-headers.cc:250
#2  0xb793715a in ns3::Packet::PeekHeader (this=0x80b61b0, header=...)
    at ../src/network/model/packet.cc:279
#3  0xb7f9103e in ns3::MeshWifiInterfaceMac::Receive (this=0x0, 
    packet=<error reading variable: Cannot access memory at address 0x0>, hdr=
    0x0) at ../src/mesh/model/mesh-wifi-interface-mac.cc:429
#4  0x00000000 in ?? ()
Comment 1 Daniel L. 2013-10-30 14:11:23 EDT
I tested with gcc 4.8.2 on Fedora 19 (32-bit). Still has the same problem on optimized build.
Comment 2 Tom Henderson 2013-12-14 02:04:54 EST
Not clear whether this is a compiler issue; leave open for now but possibly close as wontfix if it disappears in future compiler versions.
Comment 3 Tom Henderson 2014-04-26 08:46:49 EDT
also visible in Fedora Core 20, 32-bit
Comment 4 Tom Henderson 2014-05-30 13:36:29 EDT
valgrind output for this crashing test

valgrind ./ns3-dev-test-runner-optimized --suite=devices-mesh-dot11s-regression
==6381== Memcheck, a memory error detector
==6381== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==6381== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==6381== Command: ./ns3-dev-test-runner-optimized --suite=devices-mesh-dot11s-regression
==6381== 
==6381== Use of uninitialised value of size 4
==6381==    at 0x570035B: ns3::WifiInformationElement::DeserializeIfPresent(ns3::Buffer::Iterator) (wifi-information-element.cc:70)
==6381==    by 0x5753EC2: ns3::MgtProbeResponseHeader::Deserialize(ns3::Buffer::Iterator) (mgt-headers.cc:250)
==6381==    by 0x5BD1189: ns3::Packet::PeekHeader(ns3::Header&) const (packet.cc:279)
==6381==    by 0x4C00C9D: ns3::MeshWifiInterfaceMac::Receive(ns3::Ptr<ns3::Packet>, ns3::WifiMacHeader const*) (mesh-wifi-interface-mac.cc:429)
==6381== 
==6381== Invalid read of size 4
==6381==    at 0x570035B: ns3::WifiInformationElement::DeserializeIfPresent(ns3::Buffer::Iterator) (wifi-information-element.cc:70)
==6381==    by 0x5753EC2: ns3::MgtProbeResponseHeader::Deserialize(ns3::Buffer::Iterator) (mgt-headers.cc:250)
==6381==    by 0x5BD1189: ns3::Packet::PeekHeader(ns3::Header&) const (packet.cc:279)
==6381==    by 0x4C00C9D: ns3::MeshWifiInterfaceMac::Receive(ns3::Ptr<ns3::Packet>, ns3::WifiMacHeader const*) (mesh-wifi-interface-mac.cc:429)
==6381==  Address 0x8 is not stack'd, malloc'd or (recently) free'd
==6381== 
==6381== 
==6381== Process terminating with default action of signal 11 (SIGSEGV)
==6381==  Access not within mapped region at address 0x8
==6381==    at 0x570035B: ns3::WifiInformationElement::DeserializeIfPresent(ns3::Buffer::Iterator) (wifi-information-element.cc:70)
==6381==    by 0x5753EC2: ns3::MgtProbeResponseHeader::Deserialize(ns3::Buffer::Iterator) (mgt-headers.cc:250)
==6381==    by 0x5BD1189: ns3::Packet::PeekHeader(ns3::Header&) const (packet.cc:279)
==6381==    by 0x4C00C9D: ns3::MeshWifiInterfaceMac::Receive(ns3::Ptr<ns3::Packet>, ns3::WifiMacHeader const*) (mesh-wifi-interface-mac.cc:429)
==6381==  If you believe this happened as a result of a stack
==6381==  overflow in your program's main thread (unlikely but
==6381==  possible), you can try to increase the size of the
==6381==  main thread stack using the --main-stacksize= flag.
==6381==  The main thread stack size used in this run was 8388608.
==6381== 
==6381== HEAP SUMMARY:
==6381==     in use at exit: 851,421 bytes in 16,996 blocks
==6381==   total heap usage: 29,387 allocs, 12,391 frees, 1,950,160 bytes allocated
==6381== 
==6381== LEAK SUMMARY:
==6381==    definitely lost: 0 bytes in 0 blocks
==6381==    indirectly lost: 0 bytes in 0 blocks
==6381==      possibly lost: 313,749 bytes in 7,043 blocks
==6381==    still reachable: 537,672 bytes in 9,953 blocks
==6381==         suppressed: 0 bytes in 0 blocks
==6381== Rerun with --leak-check=full to see details of leaked memory
==6381== 
==6381== For counts of detected and suppressed errors, rerun with: -v
==6381== Use --track-origins=yes to see where uninitialised values come from
==6381== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
Comment 5 Tom Henderson 2014-05-30 14:46:28 EDT
This problem is introduced in changeset 10139:17a71cd49da3 with addition of htCapabilities.

It seems that perhaps the mesh code beacons need to add some support for HtCapabilities (or the code in Wifi needs reworked to account for possibility that a model like mesh does not have HtCapability awareness).

In the wifi code, there is code such as

  MgtProbeResponseHeader probe;
  probe.SetSsid (GetSsid ());
  probe.SetSupportedRates (GetSupportedRates ());
  probe.SetBeaconIntervalUs (m_beaconInterval.GetMicroSeconds ());
if (m_htSupported)
    {
      probe.SetHtCapabilities (GetHtCapabilities());
      hdr.SetNoOrder();
    }


In the mesh code, it looks like this (mesh-wifi-beacon.cc):

MeshWifiBeacon::MeshWifiBeacon (Ssid ssid, SupportedRates rates, uint64_t us)
{
  m_header.SetSsid (ssid);
  m_header.SetSupportedRates (rates);
  m_header.SetBeaconIntervalUs (us);
}

It seems that perhaps the deserialization code expects ht support

uint32_t
MgtProbeResponseHeader::Deserialize (Buffer::Iterator start)
{
  Buffer::Iterator i = start;
  m_timestamp = i.ReadLsbtohU64 ();
  m_beaconInterval = i.ReadLsbtohU16 ();
  m_beaconInterval *= 1024;
  i = m_capability.Deserialize (i);
  i = m_ssid.Deserialize (i);
  i = m_rates.Deserialize (i);
  //i.Next (3); // ds parameter set
  i = m_rates.extended.DeserializeIfPresent (i);
  i = m_htCapability.DeserializeIfPresent (i);
  return i.GetDistanceFrom (start);
}

DeserializeIfPresent() has to perform a read:

  Buffer::Iterator start = i;
  uint8_t elementId = i.ReadU8 ();


so perhaps there needs to be an outer check before entering DeserializeIfPresent on whether the iterator can support another read.
Comment 6 Tom Henderson 2014-06-05 17:20:08 EDT
Some more information about this crasher.

- crash is observed on optimization levels -O2, -O3, but not on -O0 and -O1
- crash is observed also on g++-4.9.0 (32-bit)

I stepped through this with a debugger, and found that the vtable for m_htCapability was getting corrupted somehow, immediately after line 249 in
MgtProbeResponseHeader::Deserialize (), such that it crashes on line 250 within DeserializeIfPresent().  It wasn't apparent what the problem is with the code.

I also observed an issue that WifiInformationElement derives from SimpleRefCount, yet it is used as a normal class (without reference counting pointer) within src/wifi.  But in the mesh code, it is used as a reference-counted Object.
Comment 7 Tom Henderson 2014-06-07 12:01:30 EDT
Decision on ns-3.20 for this bug:  wontfix, but add as a known issue:

This was added to RELEASE_NOTES:

Known issues
------------
- Bug 1770 - The mesh module will crash if used for g++ version >= 4.8.1
in optimized mode, on a 32-bit Linux machine.  Lowering the optimization
level to -O1 in this case can be used as a workaround.


After the release, I suggest to try to refactor the usage of WifiInformationElement to remove the SimpleRefCount dependence (will require update to mesh module) and see if it resolves (see comment 6).
Comment 8 Tom Henderson 2015-01-14 18:00:28 EST
fixed in changeset 11126:dd28df6a4b64

The problem was that the wifi extended rates information element contains a back-pointer to the SupportedRates IE.  In the course of creating mesh beacon packets, the SupportedRates IE is copied, but the address stored in the extended rates IE is not updated, and points to freed memory when the original SupportedRates object goes out of scope.

The overall design of the IE handling needs some work, but bug 881 is reopened for this.