Bug 130 - Segfault when deleting a TCP socket in the "connection fail callback"
: Segfault when deleting a TCP socket in the "connection fail callback"
Status: RESOLVED FIXED
: ns-3
internet-stack
: pre-release
: All All
: P3 normal
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-02-04 13:47 EDT by
Modified: 2008-02-06 12:36 EDT (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-02-04 13:47:49 EDT
I have a script that removes the last reference to a TCP socket during the
"connection failed callback".  That causes a subsequent segmentation fault. 
Valgrind trace follows:

TCP connection failed.
==10582== Invalid read of size 8
==10582==    at 0x522792D: ns3::TcpSocket::ReTxTimeout() (tcp-socket.cc:992)
==10582==    by 0x52247E5: ns3::Ptr<ns3::EventImpl>
ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void
(ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify()
(simulator.h:661)
==10582==    by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39)
==10582==    by 0x516C159: ns3::SimulatorPrivate::ProcessOneEvent()
(simulator.cc:155)
==10582==    by 0x516C1A1: ns3::SimulatorPrivate::Run() (simulator.cc:183)
==10582==    by 0x516C2E2: ns3::Simulator::Run() (simulator.cc:418)
==10582==    by 0x40C442: main (bench-olsr.cc:750)
==10582==  Address 0x67fb660 is 0 bytes inside a block of size 408 free'd
==10582==    at 0x4C2162D: operator delete(void*) (vg_replace_malloc.c:336)
==10582==    by 0x522D253: ns3::TcpSocket::~TcpSocket() (tcp-socket.cc:101)
==10582==    by 0x5109A49: ns3::Object::MaybeDelete() const (object.cc:512)
==10582==    by 0x41107A: ns3::Object::Unref() const (object.h:437)
==10582==    by 0x4110A4: ns3::Ptr<ns3::Socket>::~Ptr() (ptr.h:402)
==10582==    by 0x51E6BA9: ns3::Socket::NotifyConnectionFailed()
(socket.cc:143)
==10582==    by 0x5227240: ns3::TcpSocket::Retransmit() (tcp-socket.cc:1030)
==10582==    by 0x52279CB: ns3::TcpSocket::ReTxTimeout() (tcp-socket.cc:999)
==10582==    by 0x52247E5: ns3::Ptr<ns3::EventImpl>
ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void
(ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify()
(simulator.h:661)
==10582==    by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39)
==10582== 
==10582== Process terminating with default action of signal 11 (SIGSEGV):
dumping core
==10582==  Bad permissions for mapped region at address 0x61D9F0
==10582==    at 0x61D9F0: ???
==10582==    by 0x52247E5: ns3::Ptr<ns3::EventImpl>
ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void
(ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify()
(simulator.h:661)
==10582==    by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39)
==10582==    by 0x516C159: ns3::SimulatorPrivate::ProcessOneEvent()
(simulator.cc:155)
==10582==    by 0x516C1A1: ns3::SimulatorPrivate::Run() (simulator.cc:183)
==10582==    by 0x516C2E2: ns3::Simulator::Run() (simulator.cc:418)
==10582==    by 0x40C442: main (bench-olsr.cc:750)
==10582==
------- Comment #1 From 2008-02-04 13:50:59 EDT -------
I see some lines like:

       m_retxEvent = Simulator::Schedule (rto,&TcpSocket::ReTxTimeout,this);

Most likely there's a scheduled event that was not properly cancelled in the
socket destructor.

*sigh* Why doesn't anyone believe me when I say that EventGarbageCollector is
needed? :-(
------- Comment #2 From 2008-02-04 13:54:58 EDT -------
Patch to fix the bug:

diff -r a1edfbcca81d src/internet-node/tcp-socket.cc
--- a/src/internet-node/tcp-socket.cc   Mon Feb 04 14:15:18 2008 +0000
+++ b/src/internet-node/tcp-socket.cc   Mon Feb 04 18:53:38 2008 +0000
@@ -121,6 +121,7 @@ TcpSocket::Destroy (void)
   m_node = 0;
   m_endPoint = 0;
   m_tcp = 0;
+  m_retxEvent.Cancel ();
 }
 int
 TcpSocket::FinishBind (void)
------- Comment #3 From 2008-02-05 10:02:23 EDT -------
Pushed Gustavo's fix as revision 2315 in ns-3-dev (changeset df5e3f1d2a1a)
------- Comment #4 From 2008-02-06 12:09:55 EDT -------
I found another similar segfault today.  I am not sure how to reproduce,
though.
Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to
~TcpSocket()?  I tried it locally, and it seems to fix the issue for me...
------- Comment #5 From 2008-02-06 12:25:32 EDT -------
(In reply to comment #4)
> I found another similar segfault today.  I am not sure how to reproduce,
> though.
> Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to
> ~TcpSocket()?  I tried it locally, and it seems to fix the issue for me...
> 

Please feel free to push your change...I'm not sure I understand the issue of
moving the location of the canceling of the event, but if it fixes a segfault
then please do.
------- Comment #6 From 2008-02-06 12:36:24 EDT -------
(In reply to comment #5)
> (In reply to comment #4)
> > I found another similar segfault today.  I am not sure how to reproduce,
> > though.
> > Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to
> > ~TcpSocket()?  I tried it locally, and it seems to fix the issue for me...
> > 
> 
> Please feel free to push your change...I'm not sure I understand the issue of
> moving the location of the canceling of the event, but if it fixes a segfault
> then please do.
> 

The issue is that, if you look at ~TcpSocket, you'll see that not all code
paths lead to Destroy() being called, i.e. it appears (on the surface) possible
for a TcpSocket to be freed without calling the Destroy() method first.

Fix pushed as 2318:d4217d7ba4ba.