Difference between revisions of "HOWTO understand and find cause of exited with code -11 errors"

From Nsnam
Jump to: navigation, search
(Finding and Fixing the Problem)
(Finding and Fixing the Problem)
 
(2 intermediate revisions by the same user not shown)
Line 160: Line 160:
 
   insight hs
 
   insight hs
  
In either case, you will end up with a new window -- an insight source window.  If you click the little "running man" icon on the toolbar right under the "File" menu item, a breakpoint (google is your friend) will automatically be set for you at the start of the ''main'' function and your program will be started and run.  Execution of your program will be stopped at the first source line in ''main'' which is the NS_LOG_UNCOND that prints "Hello Simulator".  The fact that the program has stopped is indicated to you by the green background at the source line.
+
In either case, you will end up with a new window -- an insight source window.  If you click the little "running man" icon on the toolbar right under the "File" menu item, a breakpoint (google is your friend) will automatically be set for you at the start of the ''main'' function and your program will be started and run.  Execution of your program will be stopped at the first source line in ''main'' which is the NS_LOG_UNCOND that prints "Hello Simulator".  The fact that the program has stopped is indicated to you by the green background at the source line. You should be seeing a window that looks like the following;
  
[[image:insight.png]  
+
[[Image:insight.png]]
  
 
To the right of the "running man" are some parenthesis icons that control execution of your program.  Most of them have arrows that end up pointing down (stepping "down" into functions).  One of them has a red arrow pointing to the right.  This is the "continue" button.  If you press this button, your program will "continue" running until it exits, hits another breakponit, or does something evil.
 
To the right of the "running man" are some parenthesis icons that control execution of your program.  Most of them have arrows that end up pointing down (stepping "down" into functions).  One of them has a red arrow pointing to the right.  This is the "continue" button.  If you press this button, your program will "continue" running until it exits, hits another breakponit, or does something evil.

Latest revision as of 00:21, 23 April 2010

Main Page - Current Development - Developer FAQ - Tools - Related Projects - Project Ideas - Summer Projects

Installation - Troubleshooting - User FAQ - HOWTOs - Samples - Models - Education - Contributed Code - Papers

One of the most common questions we hear on the ns-3 developers list is a variation on the following theme: I wrote my program, but when I run it I get a red line that ends with "exited with code -11". Please tell me what I did wrong.

The complete output from waf will look something like,

 ./waf --run hs
 Waf: Entering directory `/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build'
 Waf: Leaving directory `/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build'
 'build' finished successfully (0.881s)
 Command ['/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build/debug/scratch/hs'] exited with code -11

In this HOWTO, we describe what this means and how you can go about finding your problem.

HOWTO understand and find cause of exited with code -11 errors

The zeroth thing to understand about debugging is that one of the least productive things you can do is post a pile of your code on a developers list and ask why it doesn't work. Developers are very busy people who won't have a lot of spare time to do your work for you. Try and figure it out on your own. You are doing the right thing by reading this page!

The first thing to understand is that debugging anything is an art and skill that you need to learn. Some jokers have observed that programming can be defined as the act of introducing bugs. This is not too far from the truth (which is why it is funny). Since you will be programming in the ns-3 environmnent, you are going to have to develop debugging skills, whether you like it or not, in order to remove the bugs you create. This HOWTO is only going to scratch the surface of the subject of debugging and hopefully provide you with a direction and a few hints regarding how to start. You are going to have to figure most of this out on your own, though. Don't worry, it gets easier.

It will be much easier on you if you learn from the experience of others. There are many books available that will help you learn the details of this huge subject. If you go to Amazon.com and search for "debugging" in their books section, you will find over 2,000 results. A couple of books that have been recommended on ns-developers are

  • Agans, "Debugging"
  • Matloff and Salzman, "The Art of Debugging with GDB, DDD, and Eclipse"

Reproduce the Problem

If you have read a good book on debugging, you will know that the first step in finding any problem is to figure out how to reproduce it. In this case, we need to produce it, so let's take the simplest ns-3 example and create a reproducible problem.

The hello-simulator.cc example, you may recall, just prints the text "Hello Simulator" on your console using the ns-3 logging system. It is simple enough that we can reproduce it here:

 #include "ns3/core-module.h"
 NS_LOG_COMPONENT_DEFINE ("HelloSimulator");
 using namespace ns3;
 
   int 
 main (int argc, char *argv[])
 {
   NS_LOG_UNCOND ("Hello Simulator");
 }

Go ahead and copy the example into the scratch directory. The following assumes that you are in the base directory of an ns-3 distribution (the directory where RELEASE, VERSION and src are found).

 cp examples/tutorial/hello-simulator.cc scratch/hs.cc

Now pull up the file you just created (scratch/hs.cc) in you favorite programmer's editor and add a line so that the main funtion looks like this:

   int 
 main (int argc, char *argv[])
 {
   NS_LOG_UNCOND ("Hello Simulator");
   return 1;
 }

Go ahead and build and run the new program:

 ./waf
 ./waf --run hs

You should see something that looks like:

 Waf: Entering directory `/your/directory/path/ns-3-allinone-dev/ns-3-dev/build'
 Waf: Leaving directory `/your/directory/path/ns-3-allinone-dev/ns-3-dev/build'
 'build' finished successfully (0.872s)
 Hello Simulator
 Command ['/your/directory/path/ns-3-allinone-dev/ns-3-dev/build/debug/scratch/hs'] exited with code 1

You should now have a reproducible bug, since if you repeat the waf run command, your program exits with code 1 every time.

What the Problem Means

The short answer is that the program did not return a zero as its exit or return code. Waf reports this back in red since it usually means that the program has failed in some way. This return code can either come from the return value from your main function, or it can be supplied by the operating system or run-time system if your program does not complete for some reason.

In general, strictly positive return codes indicate a program that completed "normally" (that is, the main function returned some value) but detected some error. In the code above, the hs program completed normally, but returned the value one. In real-world programs, this would indicate an error condition that you as a user could look up in the hs documentation and interpret.

Negative return codes typically indicate that the program has failed in some way such that it cannot complete. In Unix and Linux, these codes are usually the negative of a so-called SIGNAL. You can find a list of signals in /usr/include/asm/signal.h if you are interested. The first few are:

 #define SIGHUP   1
 #define SIGINT   2
 #define SIGQUIT  3
 #define SIGILL   4
 #define SIGTRAP  5
 #define SIGABRT  6
 #define SIGIOT   6
 #define SIGBUS   7
 #define SIGFPE   8
 #define SIGKILL  9
 #define SIGUSR1  10
 #define SIGSEGV  11
 #define SIGUSR2  12
 #define SIGPIPE  13
 #define SIGALRM  14
 #define SIGTERM  15

in this case, you may infer that if your program returns an exit code of "-11" the root cause is something called a SIGSEGV signal since its defined value is 11, which is the negative of -11.

Google is your friend. If you search for sigsegv, you will find a nice Wikipedia entry: http://en.wikipedia.org/wiki/SIGSEGV which then points you to another Wikipedia page: http://en.wikipedia.org/wiki/Segmentation_fault

On that page, you will find a reasonable definition of a segmentation violation:

 A segmentation fault (often shortened to segfault) or access violation is a 
 particular error condition that can occur during the operation of computer 
 software. A segmentation fault occurs when a program attempts to access a 
 memory location that it is not allowed to access, or attempts to access a
 memory location in a way that is not allowed (for example, attempting to 
 write to a read-only location, or to overwrite part of the operating 
 system)

This is what is happening when you run your program and you see the dreaded red message from waf:

 Command ['/your/directory/path/ns-3-allinone-dev/ns-3-dev/build/debug/scratch/hs'] exited with code -11

Let's Reproduce One of Those

Pull up the file you created (scratch/hs.cc) in you favorite programmer's editor and change that line you added so that the main funtion looks like this:

   int 
 main (int argc, char *argv[])
 {
   NS_LOG_UNCOND ("Hello Simulator");
   *(char *)0 = 0;
 }

If you build and run, you should now see that waf highlights the fact that your program crashes with a segmentation fault by displaying the infamous red line:

 Waf: Entering directory `/your/directory/path/ns-3-allinone-dev/ns-3-dev/build'
 Waf: Leaving directory `/your/directory/path/ns-3-allinone-dev/ns-3-dev/build'
 'build' finished successfully (0.872s)
 Hello Simulator
 Command ['/your/directory/path/ns-3-allinone-dev/ns-3-dev/build/debug/scratch/hs'] exited with code -11

What you have done by adding the line,

   *(char *)0 = 0;

is to try to write a zero byte to address zero of your system. In every system that I can think of, address zero is located in a reserved system page that most likely includes important things like reset vectors which users must not be allowed to change. Therefore, this access must be illegal for several reasons; and your operating system detects this attempt to modify the page and summarily stops your program. This is called "a crash."

So, when your program exits with a SIGSEGV, it has done something that the operating system considers as bad. The red line with the error code from waf is simply telling you what has happened. Your next job is to figure out what you did that the operating system doesn't like.

Finding and Fixing the Problem

Since there are literally an infinte number of ways you can introduce a segmentation violation into your code, there is no way I can tell you how to fix your code. What I can do is to explain how to run your program in a debugger so you can see the point at which the operating system decided your program has gone bad. There are many debuggers, and you will probably come to know and love gdb for its power and ubiquity. Let's start with something small, though. For beginners, a graphical debugger is probably the way to go, and insight is fairly intuitive to use. It turns out that insight is actually a graphical wrapper for gdb, so you can eventually get to the more powerful gdb features as you learn more; so this isn't a completely pointless exercise :-)

If your system does not come with insight, you can install the package simply by using either

 sudo yum install insight

or

 sudo apt-get install insight

There are two basic ways to run a program under a debugger in ns-3. You can run the program using a so-called command-template

 ./waf --run hs --command-template="insight %s"

or you can enter a shell and change into the appropriate directory and run the degugger directly

 ./waf shell
 cd build/debug/scratch
 insight hs

In either case, you will end up with a new window -- an insight source window. If you click the little "running man" icon on the toolbar right under the "File" menu item, a breakpoint (google is your friend) will automatically be set for you at the start of the main function and your program will be started and run. Execution of your program will be stopped at the first source line in main which is the NS_LOG_UNCOND that prints "Hello Simulator". The fact that the program has stopped is indicated to you by the green background at the source line. You should be seeing a window that looks like the following;

Insight.png

To the right of the "running man" are some parenthesis icons that control execution of your program. Most of them have arrows that end up pointing down (stepping "down" into functions). One of them has a red arrow pointing to the right. This is the "continue" button. If you press this button, your program will "continue" running until it exits, hits another breakponit, or does something evil.

Go ahead and press the button. You will see a warning popup window appear that tells you that insight has "received signal SIGSEGV, Segmentation fault". You expected something like that, correct? If you dismiss the popup, insight will show you at which source line the program stopped by coloring its background green. In this case, the offending line is,

 *(char *)0 = 0;

which caused a segmentation fault by attempting to write to a system page (outside the valid address space of your program).

What Next

Obviously, this HOWTO is not a place to provide a manual for the insight debugger, nor is it a place for general debugging references. You can attempt to push forward on your own by reading the insight documentation and trying to figure out debugging on your own. If you are new enough to this debugging thing to have learned anything in this HOWTO, I strongly recommend that you pick up one of the books on debugging techniques and start working through some of their examples. It will most likely save you a lot of stress.

Conclusion

As mentioned above, debugging is both an art and a skill and you can spend the rest of your life mastering it. Many of us learned the hard way by having many, many bugs master us. You can choose that way, the hard way, but we who have been down that road think it will pay off if you take a small break at this point and do some reading or ask a colleague with real experience in this area for some help and guidance.

Good luck and happy debugging!


Craigdo 20:46, 22 April 2010 (UTC)