Bug 1554

Summary: all python programs intermittently crash
Product: ns-3 Reporter: Tom Henderson <tomh>
Component: examplesAssignee: ns-bugs <ns-bugs>
Status: RESOLVED FIXED    
Severity: blocker CC: watrous
Priority: P5    
Version: pre-release   
Hardware: PC   
OS: Linux   
Bug Depends on: 954    
Bug Blocks:    

Description Tom Henderson 2012-12-13 14:36:50 EST
This is only seen on some platforms (OS X, and a Linux Ubuntu 10.04) for the tip of ns-3-dev (f094300690db):

./waf shell
gdb python
(gdb) r examples/tutorial/first.py
Starting program: /usr/bin/python examples/tutorial/first.py
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffe72d1700 (LWP 23975)]

Program received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=0x7ffff6d37e40) at malloc.c:5136
5136	malloc.c: No such file or directory.
	in malloc.c
(gdb) bt
#0  malloc_consolidate (av=0x7ffff6d37e40) at malloc.c:5136
#1  0x00007ffff6a32ba1 in _int_malloc (av=0x7ffff6d37e40, bytes=512)
    at malloc.c:4737
#2  0x00007ffff6a33ade in *__GI___libc_malloc (bytes=512) at malloc.c:3660
#3  0x00007fffebca724d in operator new(unsigned long) ()
   from /usr/lib/libstdc++.so.6
#4  0x00007ffff5a35b23 in __gnu_cxx::new_allocator<ns3::Ptr<ns3::Packet> >::allocate (this=0x7fffffffd380, __n=64)
    at /usr/include/c++/4.4/ext/new_allocator.h:89
#5  0x00007ffff21786c5 in std::_Deque_base<ns3::Ptr<ns3::Packet>, std::allocator<ns3::Ptr<ns3::Packet> > >::_M_allocate_node (this=0x7fffffffd380)
    at /usr/include/c++/4.4/bits/stl_deque.h:444
#6  0x00007ffff2177df1 in std::_Deque_base<ns3::Ptr<ns3::Packet>, std::allocator<ns3::Ptr<ns3::Packet> > >::_M_create_nodes (this=0x7fffffffd380, __nstart=
    0xb84de8, __nfinish=0xb84df0) at /usr/include/c++/4.4/bits/stl_deque.h:538
#7  0x00007ffff21771f5 in std::_Deque_base<ns3::Ptr<ns3::Packet>, std::allocator<ns3::Ptr<ns3::Packet> > >::_M_initialize_map (this=0x7fffffffd380,
    __num_elements=0) at /usr/include/c++/4.4/bits/stl_deque.h:512
#8  0x00007ffff2176656 in _Deque_base (this=0x7fffffffd380)
    at /usr/include/c++/4.4/bits/stl_deque.h:375
#9  0x00007ffff2175eb4 in deque (this=0x7fffffffd380)
    at /usr/include/c++/4.4/bits/stl_deque.h:691
#10 0x00007ffff2168fee in UdpSocketImpl (this=0xb84c70)
    at ../src/internet/model/udp-socket-impl.cc:79
#11 0x00007ffff213e5ea in ns3::CreateObject<ns3::UdpSocketImpl> ()
    at ./ns3/object.h:393
#12 0x00007ffff2137810 in ns3::UdpL4Protocol::CreateSocket (this=0xb814b0)
    at ../src/internet/model/udp-l4-protocol.cc:165
#13 0x00007ffff2180364 in ns3::UdpSocketFactoryImpl::CreateSocket (this=
    0xb825e0) at ../src/internet/model/udp-socket-factory-impl.cc:45
#14 0x00007fffefd84539 in ns3::Socket::CreateSocket (node=..., tid=...)
    at ../src/network/model/socket.cc:77
#15 0x00007ffff3a6c96d in ns3::UdpEchoServer::StartApplication (this=0xb842e0)
    at ../src/applications/model/udp-echo-server.cc:81
#16 0x00007fffefd33efe in Notify (this=0x950b20) at ./ns3/make-event.h:94
#17 0x00007fffef92f83d in ns3::EventImpl::Invoke (this=0x950b20)
    at ../src/core/model/event-impl.cc:45
Comment 1 Mitch Watrous 2012-12-13 15:10:35 EST
This bug can be fixed for this example by adding this to the beginning of the example:

    ns.core.Time.SetResolution(ns.core.Time.NS)
Comment 2 Tom Henderson 2012-12-14 11:10:57 EST
All of our python examples (not just first.py) are now intermittently crashing on some platforms (OS X, Ubuntu 10.04) with older gcc.

Explicit calls to SetResolution() seem to avoid the crashes.


This test program will intermittently crash:

import ns.core

ns.core.Simulator.Run()
ns.core.Simulator.Destroy()

This one will not crash:

import ns.core

ns.core.Time.SetResolution(ns.core.Time.NS)
ns.core.Simulator.Run()
ns.core.Simulator.Destroy()

When it crashes it seems to be dying in this loop (m_event is null pointer):

void
DefaultSimulatorImpl::Run (void)
{
  NS_LOG_FUNCTION (this);
  // Set the current threadId as the main threadId
  m_main = SystemThread::Self();
  ProcessEventsWithContext ();
  m_stop = false;

  while (!m_events->IsEmpty () && !m_stop)
    {
      ProcessOneEvent ();
    }

and it seems to be doing some thread context switching around that time.
Comment 3 Tom Henderson 2012-12-14 17:54:17 EST
This was debugged to be a manifestation of the bug 954 code.

The bug 954 patch can be reverted, and then this can be closed.
Comment 4 Tom Henderson 2012-12-17 00:52:09 EST
fixed with the revert of the bug 954 patch