Bug 931 – Abnormal exit reports SIGSEGV on failure

Bug 931 - Abnormal exit reports SIGSEGV on failure


Summary:	Abnormal exit reports SIGSEGV on failure

Status:	RESOLVED FIXED

Product:	ns-3
Classification:	Unclassified
Component:	core
Version:	ns-3-dev
Hardware:	All All

Importance:	P5 enhancement
Assigned To:	Mathieu Lacage

URL:
Keywords:

Depends on:
Blocks:	933
	Show dependency tree / graph

Reported:	2010-06-01 01:14 EDT by Quincy Tse
Modified:	2010-06-22 21:31 EDT (History)
CC List:	4 users (show)

See Also:

Attachments
proposed patch using std::abort (1.25 KB, application/octet-stream) 2010-06-01 02:07 EDT, Quincy Tse	Details
alternative patch using std::terminate (1.25 KB, patch) 2010-06-01 02:13 EDT, Quincy Tse	Details \| Diff
Reworked assert, abort and fatal-error with stream flushing capibilities (17.92 KB, patch) 2010-06-02 23:57 EDT, Quincy Tse	Details \| Diff
Patch changing the behaviour of assert, abort and fatal_error (13.02 KB, patch) 2010-06-03 01:46 EDT, Quincy Tse	Details \| Diff
Show Obsolete (3) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Quincy Tse 2010-06-01 01:14:41 EDT

NS_ASSERT and NS_ASSERT_MSG currently terminates the running program on assertion fail by dereferencing a null pointer, causing a SIGSEGV.

The return code of SIGSEGV may be confusing as it usually indicates a memory problem under normal use (eg. in normal C/C++ programs). The segmentation fault introduced here can mislead debug efforts, especially when the assert condition involves derefercing some pointers. (eg. NS_ASSERT(p->blah != SOME_VALUE)

In comparison, the standard assert() interrupts program flow by calling the function std::abort() from <cstdlib>, which raises a SIGABRT to the program.

Comment 1 Quincy Tse 2010-06-01 02:07:08 EDT

Created attachment 903 [details]
proposed patch using std::abort

This patch terminates program flow using std::abort, which is an abnormal termination and can be caught by gdb. Have not been tested under another debugger to check whether it can stop the debugger.

Comment 2 Quincy Tse 2010-06-01 02:13:30 EDT

Created attachment 904 [details]
alternative patch using std::terminate

std::terminate can be more flexible - it call specific handler to do some clean up tasks before dying. This is what is called when there are unhandled exceptions.

The default behaviour (since gcc 4.3) is to do a verbose exit (state the type of exception and print exception.what()). Here we raised a std::terminate without an exception, hence this default handler will just state the function was called without exception.

The behaviour can be changed using std::set_terminate(). Adding the line std::set_terminate(std::abort) to cause the std::terminate() function to die silently (but this will also cause unhandled exceptions to die silently).

Extra macros can be defined to change the behaviour between silent exit, verbose exit or a custom handler.

Comment 3 Antti Mäkelä 2010-06-01 03:12:48 EDT

If possible, I'd really like the program to flush and close all trace files that are open if performing a cleanup. 

It would be MUCH easier to do debugging if I could see in wireshark what's going on in the network just before stuff breaks, but right now all I get is bunch of empty files (if the captures are small), or at least captures that are cut off from the middle.

Comment 4 Mathieu Lacage 2010-06-01 03:17:52 EDT

(In reply to comment #3)
> If possible, I'd really like the program to flush and close all trace files
> that are open if performing a cleanup. 

It's not really easy to do. You need to flush all opened std::ostream files. For FILE* things, it's easy to call fflush (0) but I am not aware of the std::ostream equivalent. Given the implementation of the std::ostream class in the libstdc++ library, I suspect that calling fflush(0) might just work, even for streams opened through std::ostream but I don't have time to try this.

> 
> It would be MUCH easier to do debugging if I could see in wireshark what's
> going on in the network just before stuff breaks, but right now all I get is
> bunch of empty files (if the captures are small), or at least captures that are
> cut off from the middle.

agreed, this would be nice.

Comment 5 Mathieu Lacage 2010-06-01 03:22:39 EDT

(In reply to comment #4)
> (In reply to comment #3)
> > If possible, I'd really like the program to flush and close all trace files
> > that are open if performing a cleanup. 
> 
> It's not really easy to do. You need to flush all opened std::ostream files.
> For FILE* things, it's easy to call fflush (0) but I am not aware of the
> std::ostream equivalent. Given the implementation of the std::ostream class in
> the libstdc++ library, I suspect that calling fflush(0) might just work, even
> for streams opened through std::ostream but I don't have time to try this.


Naah, does not work:


#include <iostream>
#include <fstream>
#include <stdio.h>

int main(int argc, char *argv[])
{
  std::ofstream os;
  char buffer[10];
  os.open("test.foo", std::ios_base::out);

  os.write(buffer, 10);

  fflush (0);

  sleep	(20);

  return 0;
}

Comment 6 Mathieu Lacage 2010-06-01 03:25:33 EDT

(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > If possible, I'd really like the program to flush and close all trace files
> > > that are open if performing a cleanup. 
> > 
> > It's not really easy to do. You need to flush all opened std::ostream files.
> > For FILE* things, it's easy to call fflush (0) but I am not aware of the
> > std::ostream equivalent. Given the implementation of the std::ostream class in
> > the libstdc++ library, I suspect that calling fflush(0) might just work, even
> > for streams opened through std::ostream but I don't have time to try this.
> 
> 
> Naah, does not work:
> 
> 
> #include <iostream>
> #include <fstream>
> #include <stdio.h>
> 
> int main(int argc, char *argv[])
> {
>   std::ofstream os;
>   char buffer[10];
>   os.open("test.foo", std::ios_base::out);
> 
>   os.write(buffer, 10);
> 
>   fflush (0);
> 
>   sleep    (20);
> 
>   return 0;
> }

The quick hack would be to make PcapFile keep track of all opened std::ofstream instances in a global variable and provide a static member method Flush which calls the flush of each individual stream.

Comment 7 Antti Mäkelä 2010-06-01 03:50:16 EDT

(In reply to comment #5)
> Naah, does not work:

  According to c++ reference it should...

http://www.cplusplus.com/reference/clibrary/cstdio/fflush/

"If the argument is a null pointer, all open files are flushed."

Comment 8 Mathieu Lacage 2010-06-01 03:55:08 EDT

(In reply to comment #7)
> (In reply to comment #5)
> > Naah, does not work:
> 
>   According to c++ reference it should...
> 
> http://www.cplusplus.com/reference/clibrary/cstdio/fflush/
> 
> "If the argument is a null pointer, all open files are flushed."

Yes, 'all open files', not 'all open std::ostream'. The distinction is subtle but it's pretty clear and the current behavior I pointed out above is what I expected to happen.

Comment 9 Quincy Tse 2010-06-01 04:58:35 EDT

Another workaround might be to add functions to register/unregister the ostreams under assert.h/.cc. I think catering for FILE* and ostream* should be generic enough to be included there (we probably don't need FILE* version).

The onus will then be on whoever using the assert and want certain streams be flushed to register and unregister them. Unregistering will be more of a problem as you don't want seg faults in your abort routine, and normal testing won't pick it up (won't see the seg fault unless assert failed).

Comment 10 Mathieu Lacage 2010-06-01 05:01:37 EDT

(In reply to comment #9)
> Another workaround might be to add functions to register/unregister the
> ostreams under assert.h/.cc. I think catering for FILE* and ostream* should be
> generic enough to be included there (we probably don't need FILE* version).
> 
> The onus will then be on whoever using the assert and want certain streams be
> flushed to register and unregister them. Unregistering will be more of a
> problem as you don't want seg faults in your abort routine, and normal testing
> won't pick it up (won't see the seg fault unless assert failed).

Ok. Make sure you call these register/unregister functions from PcapFile::PcapFile and PcapFile::~PcapFile

Comment 11 Antti Mäkelä 2010-06-01 05:12:50 EDT

(In reply to comment #10)
> Ok. Make sure you call these register/unregister functions from
> PcapFile::PcapFile and PcapFile::~PcapFile

  What about ASCII traces?

Comment 12 Mathieu Lacage 2010-06-01 05:17:24 EDT

add an Open and Close method to OutputStreamWrapper, call them from TraceHelper, add register/unregister calls to its open/close methods.

Comment 13 Mathieu Lacage 2010-06-01 05:17:47 EDT

(In reply to comment #12)
> add an Open and Close method to OutputStreamWrapper, call them from
> TraceHelper, add register/unregister calls to its open/close methods.

and, please, kill SetStream.

Comment 14 Quincy Tse 2010-06-01 22:37:38 EDT

Just saw that fatal-error.h and abort.h also dereference null pointers to stop the program, so this change will also need to apply to them.

May be we should also refactor these 3 files so that they produce similar outputs. This should also give us a fair bit of code reuse.

NS_ASSERT: "assert failed. file=..., line=..., cond=...\n"
NS_ASSERT_MSG: "<whatever message>\n"
NS_ABORT_MSG: "file=..., line=..., abort, msg=...\n"
NS_ABORT_IF: "file=..., line=..., abort on=...\n"
NS_ABORT_MSG_IF: "file=..., line=..., abort on=..., msg=...\n"
NS_FATAL_ERROR: "<whatever message>\n"

Proposal:
"assert failed"/"aborted". cond=..., msg=..., file=... line=..."

Here:
NS_FATAL_ERROR_NO_MSG - print the file and line info and terminates
NS_FATAL_ERROR - print error message then call NS_FATAL_ERROR_NO_MSG
NS_ABORT_MSG - print "aborted." then call NS_FATAL_ERROR
NS_ABORT_IF - print "aborted.", the conditions then NS_FATAL_ERROR_NO_MSG
NS_ASSERT - print "assert failed.", the conditions then NS_FATAL_ERROR_NO_MSG
NS_ABORT_MSG_IF - print "aborted.", the conditions, then NS_FATAL_ERROR
NS_ASSERT_MSG_IF - print "assert failed.", the conditions, then NS_FATAL_ERROR

stream-flushing code can live in files associated with fatal error, and given names NS_FATAL_ERROR_WHATEVER. (We can give aliases prefixed with NS_ASSERT_ and NS_ABORT_ if required)

Question:
What's the difference between NS_ABORT_MSG and NS_FATAL_ERROR?

Comment 15 Quincy Tse 2010-06-02 01:09:09 EDT

Did some more experiments with std::set_terminate (required for changing the behaviour). A side effect of changing this behaviour is that if there are any uncaught exceptions, the type of exception and the exception message will not be displayed. This side effect is expected because uncaught exceptions also triggers std::terminate.

Another possible issue - by coding a handler, debuggers will stop at the handler code (which raises std::abort after finishing clean up). Backtrace will still show the original location where the assert/abort/uncaught exception is from.

Are these limitations acceptable?

Comment 16 Mathieu Lacage 2010-06-02 02:35:21 EDT

(In reply to comment #15)

> Are these limitations acceptable?

yes: ns-3 is explicitely not exception-safe.

Comment 17 Tom Henderson 2010-06-02 16:11:58 EDT

(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > If possible, I'd really like the program to flush and close all trace files
> > > that are open if performing a cleanup. 
> > 
> > It's not really easy to do. You need to flush all opened std::ostream files.
> > For FILE* things, it's easy to call fflush (0) but I am not aware of the
> > std::ostream equivalent. Given the implementation of the std::ostream class in
> > the libstdc++ library, I suspect that calling fflush(0) might just work, even
> > for streams opened through std::ostream but I don't have time to try this.
> 
> 
> Naah, does not work:
> 
> 
> #include <iostream>
> #include <fstream>
> #include <stdio.h>
> 
> int main(int argc, char *argv[])
> {
>   std::ofstream os;
>   char buffer[10];
>   os.open("test.foo", std::ios_base::out);
> 
>   os.write(buffer, 10);
> 
>   fflush (0); 
    ^^^^^^
    os.flush ();

> 
>   sleep    (20);
> 
>   return 0;
> }

There is another option to make the stream flush after each operation.  Try this:

  os.flush().setf(std::ios_base::unitbuf);

  os.write(buffer, 10);  // no need to call os.flush()

The above unitbuf option might degrade performance; if so, it could be default off and toggled on by an ns-3 global value or attribute (when you have an assert or leaking program).

Comment 18 Quincy Tse 2010-06-02 23:44:23 EDT

(In reply to comment #17)
> when you have an assert or leaking program

The problem is that the most code have NS_ASSERT variants, NS_ABORT variants of NS_FATAL_ERROR variants. It is good practice to put in assert to ensure various preconditions/postconditions are met.

Comment 19 Quincy Tse 2010-06-02 23:57:30 EDT

Created attachment 906 [details]
Reworked assert, abort and fatal-error with stream flushing capibilities

This patch updates assert.h, abort.h and fatal-error.h to use common code, and should produce more consistent output.

The macro currently use __gnu_cxx::__verbose_terminate_handler, which may cause incompatibility with some non-gnu compilers. (How do I check whether the compiler is g++ compliant?)

Previously defined macros can still be used as before, but their output have been changed slightly to give more info. The following have been added (I still need to write the doxygen comments):

* NS_FATAL_ERROR_NO_MSG - terminates execution via std::terminate, printing file and line info

* NS_FATAL_SET_CXX_VERBOSE - restore std::terminate to default, post-gcc3.4 behaviour
* NS_FATAL_SET_CXX_SILENT - set std::terminate to pre-gcc3.4 behaviour (just die)
* NS_FATAL_SET_DEFAULT - set std::terminate to the stream/file-flushing code.
* NS_FATAL_SET_HANDLER - set std::terminate to some user-specified custom handler code.

* NS_FATAL_REGISTER_STREAM - register a stream to be flushed on abnormal exit (calls ns3::_registerStream(ostream&))
* NS_FATAL_UNREGISTER_STREAM - unregister a stream from the list (required to prevent segfault) (calls ns3::_unregisterStream(ostream&))

void _handleTerminate (void) - the handler to use. It iterates the list<ostream*> of registered streams and call flush, calls fflush(0), and flushes cout and cerr.

Comment 20 Quincy Tse 2010-06-02 23:59:54 EDT

How do I get it to run NS_FATAL_SET_DEFAULT on start up? I don't think it'd be a good idea to require user to add an extra line (people will forget, and won't find out until they crash)

Comment 21 Tom Henderson 2010-06-03 00:32:09 EDT

(In reply to comment #18)
> (In reply to comment #17)
> > when you have an assert or leaking program
> 
> The problem is that the most code have NS_ASSERT variants, NS_ABORT variants of
> NS_FATAL_ERROR variants. It is good practice to put in assert to ensure various
> preconditions/postconditions are met.

There seem to be two issues in this tracker item, fixing up the ASSERT, ABORT, FATAL_ERROR behavior, and getting streams to flush upon abnormal exit.  One solution to the latter is a stream handler, but wouldn't it be simpler to just have the user toggle std::ios_base::unitbuf (or explicit flush()) on all trace streams when the user runs into a problematic program?  Also, wouldn't that option also take care of the case in which memory leaks cause traces not to be written upon exit?  (unless missing traces are useful as a crude leak detector...)

Comment 22 Quincy Tse 2010-06-03 01:14:47 EDT

(In reply to comment #21)
> There seem to be two issues in this tracker item, fixing up the ASSERT, ABORT,
> FATAL_ERROR behavior, and getting streams to flush upon abnormal exit.

Agreed - it originally started as fixing the behaviour. But the proposed solutions opened up the possibility to flushing files and streams. Maybe a second bug should be started for the stream flushing bit and move those bits there.

> but wouldn't it be simpler to just
> have the user toggle std::ios_base::unitbuf (or explicit flush()) on all trace
> streams when the user runs into a problematic program?

That's one option (I'm not sure about the extra IO cost - I haven't measured it). The problem here would be that many people would just rely on streams that are opened behind the scene (eg. ascii/pcap traces, etc). I wouldn't expect an NS3 user would go around and tinker with someone elses' code just to get the traces out.

> Also, wouldn't that
> option also take care of the case in which memory leaks cause traces not to be
> written upon exit?  (unless missing traces are useful as a crude leak
> detector...)

By memory leak, I assume you're referring to the case when destructor was not called? (eg. printing summary when some statistic class was destroyed) Turning off write buffer can't help with things that have never been put into the stream yet (ie the case I'm referring to). Also, if the program crashes, destructors won't normally be called anyway.

Comment 23 Quincy Tse 2010-06-03 01:21:49 EDT

(In reply to comment #22)
> (In reply to comment #21)
> > There seem to be two issues in this tracker item, fixing up the ASSERT, ABORT,
> > FATAL_ERROR behavior, and getting streams to flush upon abnormal exit.
> 
> Agreed - it originally started as fixing the behaviour. But the proposed
> solutions opened up the possibility to flushing files and streams. Maybe a
> second bug should be started for the stream flushing bit and move those bits
> there.

I've put the stream-flushing part into bug #933. It's really conceptually 2 different issues. I'll fix up the patch for this bug.

Comment 24 Quincy Tse 2010-06-03 01:46:26 EDT

Created attachment 908 [details]
Patch changing the behaviour of assert, abort and fatal_error

Comment 25 Quincy Tse 2010-06-03 07:35:08 EDT

I think that patch should be sufficient for this bug. The flushing part I've split out to bug #933. Mathieu - could you please review and comment/commit?

Comment 26 Mathieu Lacage 2010-06-03 07:38:04 EDT

(In reply to comment #17)

> >   fflush (0); 
>     ^^^^^^
>     os.flush ();

This sample code was not trying to flush individual streams. It was trying to show how you can flush all streams at once with a single function call. fflush (0) is supposed to flush all opened FILE * and I was wondering if there was an equivalent for std::ostream objects which does not appear to exist, hence, the other suggestions to keep track of all opened std::ostream objects manually.

Comment 27 Quincy Tse 2010-06-22 21:31:33 EDT

Changeset for bug 933 incorporates this bug.