Bug 677 - gcc cxxflags plays (multiple questions)
: gcc cxxflags plays (multiple questions)
Status: RESOLVED FIXED
: ns-3
build system
: ns-3-dev
: All All
: P4 enhancement
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-09-17 11:19 EDT by
Modified: 2009-10-23 11:48 EDT (History)


Attachments
Add "release" profile (see == First ==) (323 bytes, patch)
2009-09-17 11:21 EDT, Andrey Mazo
Details | Diff
Add option "--enable-strip" (see == First ==) (1.64 KB, patch)
2009-09-17 11:23 EDT, Andrey Mazo
Details | Diff
Respect CXXFLAGS_EXTRA (see == Second ==) (803 bytes, patch)
2009-09-17 11:24 EDT, Andrey Mazo
Details | Diff
gcc -pipe (see == Third ==) (474 bytes, patch)
2009-09-17 11:24 EDT, Andrey Mazo
Details | Diff
gcc -fomit-frame-pointer (see == Fourth ==) (453 bytes, patch)
2009-09-17 11:27 EDT, Andrey Mazo
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-09-17 11:19:52 EDT
I have several questions/proposals about compiler flags.
I've decided not to create several bugs and keep everything in one place.

== First ==
gcc -g option makes executables 5 times larger in static optimized version.

What's the reason in passing -g option to gcc in optimized profile?
Is it required for valgrind checks?
Of course, it doesn't influence on memory footprint and must not significantly
change computation speed.
But anyway this debugging information is absolutely useless during long
production simulations and only consumes hard drive space.
I see 2 ways:
1) add a new profile like "release" without debugging info at all
2) add a new configure option "--enable-strip" to run gcc with -s flag (or
equivalent flag for other compilers).


== Second ==
Make wscript pick up environmental variables like CXXFLAGS_EXTRA to append them
to automatically assembled CXXFLAGS (this may be useful for some compilation
related experiments).

== Third ==
Add gcc "-pipe" option not to create temporary files (this will slightly speed
up compilation) (must be supported even under cygwin)

== Fourth ==
Add gcc "-fomit-frame-pointer" option.
This will slightly speed up execution (I've got about 1-3% speed up) and reduce
code size, but make debugging impossible on several architectures.
So I think it is acceptable under "release" profile (from First proposal).

== Fifth ==
Add gcc "-march=native" option.
Though it doesn't have any significant performance gains (I think due to
intensive memory operations and bad code/data locality), it may be valuable for
future models with intensive calculations.
------- Comment #1 From 2009-09-17 11:21:32 EDT -------
Created an attachment (id=586) [details]
Add "release" profile (see == First ==)
------- Comment #2 From 2009-09-17 11:23:13 EDT -------
Created an attachment (id=587) [details]
Add option "--enable-strip" (see == First ==)
------- Comment #3 From 2009-09-17 11:24:12 EDT -------
Created an attachment (id=588) [details]
Respect CXXFLAGS_EXTRA (see == Second ==)
------- Comment #4 From 2009-09-17 11:24:57 EDT -------
Created an attachment (id=589) [details]
gcc -pipe (see == Third ==)
------- Comment #5 From 2009-09-17 11:27:22 EDT -------
Created an attachment (id=590) [details]
gcc -fomit-frame-pointer (see == Fourth ==)
------- Comment #6 From 2009-09-17 13:32:08 EDT -------
(In reply to comment #0)
> I have several questions/proposals about compiler flags.
> I've decided not to create several bugs and keep everything in one place.
> 
> == First ==
> gcc -g option makes executables 5 times larger in static optimized version.
> 
> What's the reason in passing -g option to gcc in optimized profile?
> Is it required for valgrind checks?

For profiling purposes (cachegrind or memprof, for example), you want your code
to be both optimized and also to contain full debugging information.

> Of course, it doesn't influence on memory footprint and must not significantly
> change computation speed.
> But anyway this debugging information is absolutely useless during long
> production simulations and only consumes hard drive space.
> I see 2 ways:
> 1) add a new profile like "release" without debugging info at all
> 2) add a new configure option "--enable-strip" to run gcc with -s flag (or
> equivalent flag for other compilers).

The two ways are mutually exclusive, right?  I don't see the point of stripping
debug symbols; might as well not produce them in the first place!

But I am +1 on your proposed 'release' profile.

> 
> 
> == Second ==
> Make wscript pick up environmental variables like CXXFLAGS_EXTRA to append them
> to automatically assembled CXXFLAGS (this may be useful for some compilation
> related experiments).

Meh.. is this really so much useful?  What is wrong about using CXXFLAGS env
var. to completely override the default flags?

> 
> == Third ==
> Add gcc "-pipe" option not to create temporary files (this will slightly speed
> up compilation) (must be supported even under cygwin)

gcc "-pipe", does it really much difference?  If it's always so good an option,
why doesn't gcc use it by default?

> 
> == Fourth ==
> Add gcc "-fomit-frame-pointer" option.
> This will slightly speed up execution (I've got about 1-3% speed up) and reduce
> code size, but make debugging impossible on several architectures.
> So I think it is acceptable under "release" profile (from First proposal).

+0 (I abstain)

> 
> == Fifth ==
> Add gcc "-march=native" option.
> Though it doesn't have any significant performance gains (I think due to
> intensive memory operations and bad code/data locality), it may be valuable for
> future models with intensive calculations.
> 

-march=native appears to be a good idea considering that ns-3 is not ever
installed or packaged for distributions.
------- Comment #7 From 2009-09-17 18:29:46 EDT -------
(In reply to comment #6)

Thank you for your quick reply!

> > What's the reason in passing -g option to gcc in optimized profile?
> > Is it required for valgrind checks?
> 
> For profiling purposes (cachegrind or memprof, for example), you want your code
> to be both optimized and also to contain full debugging information.
Thank you, I understand now.

> > I see 2 ways:
> > 1) add a new profile like "release" without debugging info at all
> > 2) add a new configure option "--enable-strip" to run gcc with -s flag (or
> > equivalent flag for other compilers).
> 
> The two ways are mutually exclusive, right?  I don't see the point of stripping
> debug symbols; might as well not produce them in the first place!
Yes, I see no reason in implementing them both.
I like the first way more too, because it allows to add some other flags, that
may conflict with profiling, for example.
The second way is just an alternative.
And I know some packages behaving that way (compile with -g3 -ggdb, but then
strip the final executable).

> But I am +1 on your proposed 'release' profile.
Good!
Waiting for one more +1 from another maintainer?


> > == Second ==
> > Make wscript pick up environmental variables like CXXFLAGS_EXTRA to append them
> > to automatically assembled CXXFLAGS (this may be useful for some compilation
> > related experiments).
> 
> Meh.. is this really so much useful?  What is wrong about using CXXFLAGS env
> var. to completely override the default flags?
Well, I'm not sure, that this will be very useful for end-users, because it's
rather special use case.
But it can ease some plays with compiler flags, temporary defines or so.
CXXFLAGS are carefully assembled throughout the whole configure() in wscript,
so it's very unwisely to drop them.
Blindly overriding LINKFLAGS may be disastrous.

> > == Third ==
> > Add gcc "-pipe" option not to create temporary files (this will slightly speed
> > up compilation) (must be supported even under cygwin)
> 
> gcc "-pipe", does it really much difference?  If it's always so good an option,
> why doesn't gcc use it by default?
Gcc doesn't enable many good options by default.:)
I don't think, the difference is really measurable because of filesystem
buffers and caches.
There may be more noticeable improvement in case of many small C files, not
several large C++ ones.
But why should we produce additional overhead?

> > == Fourth ==
> > Add gcc "-fomit-frame-pointer" option.
> > This will slightly speed up execution (I've got about 1-3% speed up) and reduce
> > code size, but make debugging impossible on several architectures.
> > So I think it is acceptable under "release" profile (from First proposal).
> 
> +0 (I abstain)
Any unbiassed reasons?
Debugging under "release" profile will be already hard to impossible.

> > == Fifth ==
> > Add gcc "-march=native" option.
> > Though it doesn't have any significant performance gains (I think due to
> > intensive memory operations and bad code/data locality), it may be valuable for
> > future models with intensive calculations.
>
> -march=native appears to be a good idea considering that ns-3 is not ever
> installed or packaged for distributions.
Good!
Again, waiting for another +1?
------- Comment #8 From 2009-09-18 09:28:42 EDT -------
(In reply to comment #7)
> > gcc "-pipe", does it really much difference?  If it's always so good an option,
> > why doesn't gcc use it by default?
> Gcc doesn't enable many good options by default.:)
> I don't think, the difference is really measurable because of filesystem
> buffers and caches.
> There may be more noticeable improvement in case of many small C files, not
> several large C++ ones.
> But why should we produce additional overhead?

I've made a special synthetic benchmark.
In short, the reduction of system CPU time (reported by time(1)) is about 20%
(reduction of NS-3 compilation time is by less than 1%).
In detail, the benchmark is the following:
1) large simple preprocessed C file (actually an XPM image about 2.6M)
2) 1000 iterations with /bin/false to measure shell forking and looping
overhead
3) "gcc -c file.i" to ensure the file is in cache
4) 1000 iterations with gcc -c file.i
5) 1000 iterations with gcc -c file.i -pipe

Results (one run -- others are similar):
2) real    0m0.519s
   user    0m0.184s
   sys     0m0.324s

4) real    2m48.458s
   user    2m6.020s
   sys     0m33.138s

5) real    2m30.898s
   user    2m4.784s
   sys     0m26.182s

So, I think, that overall NS-3 compilation time will be reduced by less than
1%. (rough estimate confirms this)
------- Comment #9 From 2009-09-25 07:37:37 EDT -------
Here are some interesting patches, but NS-3 is in feature freeze, so they are
better used after the NS-3.6 release, to avoid potentially breaking the
release.
------- Comment #10 From 2009-09-25 09:06:01 EDT -------
(In reply to comment #9)
> Here are some interesting patches, but NS-3 is in feature freeze, so they are
> better used after the NS-3.6 release, to avoid potentially breaking the
> release.

No problem.
I'm doing some more benchmarks, so I can benefit from the time before open
phase.:)
------- Comment #11 From 2009-09-28 09:34:22 EDT -------
== Sixth ==
(depends on Fifth)
Instruct gcc to use SSE where possible: add gcc options like "-msse{,2,3}
-mfpmath=sse".

My measures show that this won't give any speed improvements, but may change
precision for long simulations due to rounding problems.

The gcc manual says (about -mfpmath=sse):
"The resulting code should be considerably faster in the majority of cases and
avoid the numerical instability problems of 387 code, but may break some
existing code that expects temporaries to be 80bit."

For example, running prolonged wifi-wired-bridging example compiled with and
without sse, gives, that pcap traces are equal, but mobility traces differs in
some places. The difference isn't significant (about 2ns), but may affect
strongly some special simulations.
------- Comment #12 From 2009-10-23 08:32:23 EDT -------
The tree is now open for new enhancements.  If you have updated versions of the
patches, time to upload them...
------- Comment #13 From 2009-10-23 08:46:19 EDT -------
(In reply to comment #12)
> The tree is now open for new enhancements.  If you have updated versions of the
> patches, time to upload them...
Some my planned enhancements didn't showed any performance gains and only add
more complicity to wscript, so I don't have any brand new patches.
------- Comment #14 From 2009-10-23 08:56:52 EDT -------
(From update of attachment 586 [details])
Feel free to commit this patch.
------- Comment #15 From 2009-10-23 08:58:40 EDT -------
(From update of attachment 587 [details])
The 'release' profile replaces this patch.
------- Comment #16 From 2009-10-23 09:00:21 EDT -------
(From update of attachment 588 [details])
Please add support for a CCFLAGS_EXTRA, besides CXXFLAGS_EXTRA.  Even if
ns-3-dev does not use pure C code, I know at least Mathieu is working on pure C
code in a branch.  And for consistency.  After this change, feel free to
commit.
------- Comment #17 From 2009-10-23 09:02:21 EDT -------
(From update of attachment 589 [details])
I am still unconvinced about this patch.  Why bother if the difference is so
small?  I say let's drop it.
------- Comment #18 From 2009-10-23 09:13:39 EDT -------
-fomit-frame-pointer and -march=native, ok, I guess, but only for the 'release'
profile.
------- Comment #19 From 2009-10-23 09:36:48 EDT -------
(In reply to comment #14)
> (From update of attachment 586 [details] [details])
> Feel free to commit this patch.
changeset 15524c57a627
------- Comment #20 From 2009-10-23 09:37:16 EDT -------
(In reply to comment #16)
> (From update of attachment 588 [details] [details])
> Please add support for a CCFLAGS_EXTRA, besides CXXFLAGS_EXTRA.  Even if
> ns-3-dev does not use pure C code, I know at least Mathieu is working on pure C
> code in a branch.  And for consistency.  After this change, feel free to
> commit.
changeset 6db6a279dfff
------- Comment #21 From 2009-10-23 09:37:50 EDT -------
(In reply to comment #18)
> -fomit-frame-pointer and -march=native, ok, I guess, but only for the 'release'
> profile.
changeset 8878efe25b6c
------- Comment #22 From 2009-10-23 09:41:17 EDT -------
(In reply to comment #17)
> (From update of attachment 589 [details] [details])
> I am still unconvinced about this patch.  Why bother if the difference is so
> small?  I say let's drop it.
Well, seems you're right here, thus no one else finds this helpful.
Closing bug as FIXED?