Bug 428 - build dependency problem with nsc (was: segfault in tcp-nsc-lfn)
build dependency problem with nsc (was: segfault in tcp-nsc-lfn)
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: samples
ns-3-dev
All All
: P1 normal
Assigned To: Gustavo J. A. M. Carneiro
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-01 15:35 EST by Rajib Bhattacharjea
Modified: 2009-01-23 10:21 EST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rajib Bhattacharjea 2008-12-01 15:35:17 EST
./waf --regression reports that:
Command ['/home/raj/code.nsnam.org/ns-3-dev/build/debug/examples/tcp-nsc-lfn', '--ns3::OnOffApplication::DataRate=40000', '--runtime=20'] exited with code -11 

Running in GDB gives the following backtrace.  Can anyone reproduce?

(gdb) set args --ns3::OnOffApplication::DataRate=40000 --runtime=20
(gdb) r
Starting program: /home/raj/code.nsnam.org/ns-3-dev/build/debug/examples/tcp-nsc-lfn --ns3::OnOffApplication::DataRate=40000 --runtime=20
[Thread debugging using libthread_db enabled]
[New Thread 0xb6bfb980 (LWP 12303)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6bfb980 (LWP 12303)]
0xb6f2ede0 in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0  0xb6f2ede0 in ?? () from /lib/tls/i686/cmov/libc.so.6
#1  0xb6f30edd in ?? () from /lib/tls/i686/cmov/libc.so.6
#2  0xb6f32cad in malloc () from /lib/tls/i686/cmov/libc.so.6
#3  0xb70fd447 in operator new () from /usr/lib/libstdc++.so.6
#4  0xb70fd57d in operator new[] () from /usr/lib/libstdc++.so.6
#5  0xb7b4b161 in BufferAllocate (reqSize=4096) at ../src/common/buffer.cc:136
#6  0xb7b4b241 in ns3::Buffer::Create (dataSize=4096) at ../src/common/buffer.cc:185
#7  0xb7b4bf4a in ns3::Buffer::AddAtStart (this=0xbfbc1810, start=4096) at ../src/common/buffer.cc:370
#8  0xb7b4cded in ns3::Buffer::CreateFullCopy (this=0x80bd9a8) at ../src/common/buffer.cc:598
#9  0xb7b4d1bb in ns3::Buffer::TransformIntoRealBuffer (this=0x80bd9a8) at ../src/common/buffer.cc:631
#10 0xb7b4d383 in ns3::Buffer::PeekData (this=0x80bd9a8) at ../src/common/buffer.cc:641
#11 0xb7b7bd33 in ns3::Packet::PeekData (this=0x80bd9a8) at ../src/common/packet.cc:296
#12 0xb7c912d7 in ns3::NscTcpSocketImpl::SendPendingData (this=0x80bc6e0) at ../src/internet-stack/nsc-tcp-socket-impl.cc:631
#13 0xb7c8fa52 in Notify (this=0x80be7d8) at debug/ns3/make-event.h:88
#14 0xb7b126f2 in ns3::EventImpl::Invoke (this=0x80be7d8) at ../src/simulator/event-impl.cc:39
#15 0xb7b2ce59 in ns3::DefaultSimulatorImpl::ProcessOneEvent (this=0x806afc0) at ../src/simulator/default-simulator-impl.cc:121
#16 0xb7b2ceaf in ns3::DefaultSimulatorImpl::Run (this=0x806afc0) at ../src/simulator/default-simulator-impl.cc:152
#17 0xb7b1b477 in ns3::Simulator::Run () at ../src/simulator/simulator.cc:151
#18 0x08050c7e in main (argc=3, argv=0xbfbc2074) at ../examples/tcp-nsc-lfn.cc:141
Comment 1 Rajib Bhattacharjea 2008-12-03 15:15:19 EST
g++ (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
x86 Ubuntu 8.04
32bit Pentium 4
Comment 2 Craig Dowell 2008-12-03 16:11:13 EST
Sam, can you check and see if you can reproduce this?  I'm trying it on a couple of local machines to see if I can give you any more info ...

Thanks.
Comment 3 Sam Jansen 2008-12-03 16:26:59 EST
Looks like I can reproduce it on my 64-bit machine.

What I see in the stack trace is fun:

(gdb) r  --ns3::OnOffApplication::DataRate=40000 --runtime=20
Starting program: ns-3-dev/build/debug/examples/tcp-nsc-lfn --ns3::OnOffApplication::DataRate=40000 --runtime=20
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0x7f8ad1ec56f0 (LWP 31364)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8ad1ec56f0 (LWP 31364)]
0x00007f8ad1677751 in ns3::Ipv4Address::Deserialize (buf=0xe0000020 <Address 0xe0000020 out of bounds>) at ../src/node/ipv4-address.cc:220
220	  ipv4.m_address |= buf[0];
(gdb) bt
#0  0x00007f8ad1677751 in ns3::Ipv4Address::Deserialize (buf=0xe0000020 <Address 0xe0000020 out of bounds>) at ../src/node/ipv4-address.cc:220
#1  0x00007f8ad1748999 in ns3::NscTcpSocketImpl::CompleteFork (this=Cannot access memory at address 0xdfffff64
) at ../src/internet-stack/nsc-tcp-socket-impl.cc:506
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I bet Florian could find the bug quickly, I'll cc him. In the mean time I'll continue having a look, but I doubt I'll find it any faster than anybody else!
Comment 4 Florian Westphal 2008-12-03 16:35:12 EST
Haven't been able to reproduce this on my box (x86, gcc 4.1.2), will check if i can break things on another machine tomorrow.
Comment 5 Sam Jansen 2008-12-03 16:35:22 EST
I think the problem is that the ns-3 build system isn't rebuilding NSC.

I'm still looking into it, but that looks about 95% likely at this stage.
Comment 6 Sam Jansen 2008-12-03 16:36:45 EST
(In reply to comment #5)
> I think the problem is that the ns-3 build system isn't rebuilding NSC.
> 
> I'm still looking into it, but that looks about 95% likely at this stage.
> 

So therefore this will only effect people with an old version of NSC already built. A clean checkout will probably work fine.

The reason would be due to the interface in NSC changing recently. If you have an old version, you'll get random errors due to expecting an older interface version.

I have to go now, perhaps a build system person could check if I'm right here.
Comment 7 Craig Dowell 2008-12-03 17:17:44 EST
I have been doing clean builds in all of my testing and have not been able to reproduce on any platform.
Comment 8 Sam Jansen 2008-12-03 19:46:35 EST
I've confirmed it is due to a stale version of NSC. I just did the following to fix it on my system:

cd nsc
python scons.py
cd ..
./waf --regression

And now it works fine.

I'm just looking at src/internet-stack/wscript; class NscBuildTask. The logic looks a bit crazy to me. As far as I can tell, it conditionally builds NSC based on whether the symlink exists. Doesn't seem like a very robust test. Why not just cd into the NSC dir and execute a build command every time?

I think somebody who knows the build system needs to take on this bug.
Comment 9 Gustavo J. A. M. Carneiro 2008-12-04 06:31:24 EST
(In reply to comment #8)
> I've confirmed it is due to a stale version of NSC. I just did the following to
> fix it on my system:
> 
> cd nsc
> python scons.py
> cd ..
> ./waf --regression
> 
> And now it works fine.
> 
> I'm just looking at src/internet-stack/wscript; class NscBuildTask. The logic
> looks a bit crazy to me. As far as I can tell, it conditionally builds NSC
> based on whether the symlink exists. Doesn't seem like a very robust test. Why
> not just cd into the NSC dir and execute a build command every time?

Because ./waf on a freshly built waf+nsc takes less than a second if we don't tell nsc to build itself, and takes over 30 seconds if we include nsc?

If I could offer a compromise solution, we could build NSC only when NSC is updated.  To do that we simply remove the .so files during configure if --enable-nsc is given (possibly causing a mercurial update of NSC).  This way ns-3 will build NSC the next time it builds.
Comment 10 Sam Jansen 2008-12-06 02:38:30 EST
It seems reasonable to only build nsc if we've done a mercurial sync. What you propose sounds like a workable solution to me.
Comment 11 Craig Dowell 2008-12-10 15:41:14 EST
Assigned to the buildmeister and priority reduced.  Not considered a blocker for ns-3.3.

Can be addressed as part of build system refactoring.
Comment 12 Gustavo J. A. M. Carneiro 2008-12-14 18:39:16 EST
(In reply to comment #11)
> Assigned to the buildmeister and priority reduced.  Not considered a blocker
> for ns-3.3.
> 
> Can be addressed as part of build system refactoring.
> 

-> http://code.nsnam.org/gjc/ns-3-allinone/rev/7d82b52cd662