Bug 954

Summary: Changing the simulation time resolution does not work well with attributes
Product: ns-3 Reporter: Mathieu Lacage <mathieu.lacage>
Component: coreAssignee: Peter Barnes <pdbarnes>
Status: RESOLVED FIXED    
Severity: normal CC: mathieu.lacage, ns-bugs, tomh, tommaso.pecorella, vedran
Priority: P4    
Version: ns-3-dev   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 1554, 1555    
Attachments: Peter's patch rebased on top of ns-3-dev
Add more time units and support double/integer values for all time units
Time unit patch that actually compiles and doesn't break existing tests
Time unit patch that actually works
Time units patch that actually seems to work correctly
Patch rebased on top of ns-3-dev + Peter's patch set #8

Description Mathieu Lacage 2010-07-08 04:27:33 EDT
Because the default value of all Time attributes is usually constructed with the default time resolution NS and since all these objects are constructed before the main function enters, it's really hard for users to actually be able to change the global resolution of a simulation because the Time objects created before the resolution is changed will contain values which are not coherent with the new resolution.

We need to find a way to defer the conversion of attribute Time default values until the user calls SetResolution.
Comment 1 Peter Barnes 2012-11-12 01:10:46 EST
Proposed patch submitted for code review:
https://www.nsnam.org/bugzilla/show_bug.cgi?id=954
Comment 2 Tommaso Pecorella 2012-11-12 03:48:42 EST
Can't see the patch in the attachments (and the link points to the same bugzilla page).
Comment 3 Peter Barnes 2012-11-13 13:18:24 EST
Oops.  Correct code review link:
http://codereview.appspot.com/6821106/
Comment 4 Mathieu Lacage 2012-11-23 03:50:43 EST
The overal approach sounds good, the patch looks good. (I hate the idea of adding yet another global static data structure but this is a battle we lost a long time ago)

What happens though if you do not call Time::SetResolution from your main function ? Is the system going to track all time objects ? If so, I would suggest that you delete the set when you enter Simulator::Run (and thus disable resolution changes)
Comment 5 Peter Barnes 2012-11-30 17:29:00 EST
Summary of offline discussion:
(In reply to comment #4)
> What happens though if you do not call Time::SetResolution from your main
> function ? Is the system going to track all time objects ? If so, I would
> suggest that you delete the set when you enter Simulator::Run (and thus
> disable resolution changes)

[discussion of bound callbacks deleted]

> I suppose I could be pedestrian and create a void function 
> 
>  void DeleteTimesSet () {  GetTimesSet (true); }
> 
> and then
> 
>  Simulator::Schedule ( Seconds (0), & DeleteTimesSet);

This is what I've implemented in the next patch.

> I had in mind that you would hardcode the call to SetResolution inside
> Simulator::Run to make sure that this function runs before every other
> function.

> Anyway, my first point is I think more important: I think that you do
> not want to schedule an event. You need to hardcode a conditional call
> to SetResolution inside ::Run to make sure it runs before every other
> event.

I could do that, I suppose, but if feels really out of place.  Scheduling the event from time.cc keeps all the features in one place.  Is it really so deadly to clean up at time zero?  

As an alternative, I thought I had seen something like ScheduleAtStart, but I can't find that now. (Must have been another simulator.)
Comment 6 Peter Barnes 2012-11-30 17:40:41 EST
Summary of offline discussion:
(In reply to comment #4)
> But are you actually saying that a Get function can change the state of
> your global variable ? I missed that from your patchset: this is not a
> good idea (tm) in general. There might be some specifics to your code
> that make this a great idea but it smells fishy.

Yes, it's a little weird.  The GetTimesSet function is private, and returns a static pointer to the std::set.  Once the conversion occurs, the std::set is deleted, and the pointer set to null.  All callers check for null before using it.    Because the Get is private, this unusual usage is confined to a single source file.

(Remember, the goal is a) to signal that we're no longer tracking Times, so new Times don't get added to the set, and b) to free the memory used by the set.  The alternatives I thought of seem much more cumbersome:

A second static to tell if we're still tracking Times?  Then I have to clear the set *and* update the flag, checking the flag instead of the null pointer.  And I need getter and setter for the flag.

A getter for the address of the pointer?  Then a getter for the set, and a setter to null the pointer?
)
Comment 7 Peter Barnes 2012-11-30 18:04:49 EST
Revised patch:
https://codereview.appspot.com/6821106/
Comment 8 Mathieu Lacage 2012-12-03 02:18:25 EST
(In reply to comment #5)
> > I had in mind that you would hardcode the call to SetResolution inside
> > Simulator::Run to make sure that this function runs before every other
> > function.
> 
> > Anyway, my first point is I think more important: I think that you do
> > not want to schedule an event. You need to hardcode a conditional call
> > to SetResolution inside ::Run to make sure it runs before every other
> > event.
> 
> I could do that, I suppose, but if feels really out of place.  Scheduling

I am not too worried about this: I gave a while ago on keeping the Simulator code separate from the other stuff in src/core/model.

> the event from time.cc keeps all the features in one place.  Is it really so
> deadly to clean up at time zero?  

I am worried other events might have been scheduled before and might thus execute before the time resolution is applied. I think that your code will still behave correctly if this happens so, it is only a potential performance issue so it is not a big deal. 

> 
> As an alternative, I thought I had seen something like ScheduleAtStart, but
> I can't find that now. (Must have been another simulator.)
Comment 9 Peter Barnes 2012-12-03 13:24:17 EST
(In reply to comment #8)
> (In reply to comment #5)
> > deadly to clean up at time zero?  
> 
> I am worried other events might have been scheduled before and might thus
> execute before the time resolution is applied. I think that your code will
> still behave correctly if this happens so, it is only a potential
> performance issue so it is not a big deal. 

The resolution defaults to ns on creation of the first Time.  The patch converts to the new resolution when SetResolution is called explicitly.  At no time are Time values inconsistent with the unit.  It's perfectly possible to run the whole simulation without setting the resolution; there will be no difference from current behavior.  In fact, that is what happens in the current release:  the resolution defaults to ns, the resolution can't be changed from the default by user code, and the simulator runs with the default ns resolution.  Even if there were not ~150 Times created before main(), the act of scheduling an event creates a Time, so there is never any ambiguity about the Time resolution, especially once the Run is started. 

This patch enables conversion of extant Time objects, triggered by explicit call to SetResolution in user code, leaving all Times consistent with the resolution.  The event scheduled for time zero merely releases the accumulated memory; it does no conversion.
Comment 10 Mathieu Lacage 2012-12-06 12:33:13 EST
It seems that I did not make it clear enough that I was ok with the patch for merging so, +1 for merging
Comment 11 Tom Henderson 2012-12-09 00:24:12 EST
revised patch pushed in changeset: 1a2abe07b53d
Comment 12 Tom Henderson 2012-12-10 00:48:40 EST
The patch breaks python.  Rescanning the bindings didn't resolve it.  Here is a backtrace.

To reproduce:

./waf shell
gdb python
r
>>> import ns.core

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5ae30fb in ns3::GlobalValue::GetValue (this=0x7ffff5d82c60, 
    value=...) at ../src/core/model/global-value.cc:114
114	  bool ok = m_checker->Copy (*m_currentValue, value);
(gdb) bt
#0  0x00007ffff5ae30fb in ns3::GlobalValue::GetValue (this=0x7ffff5d82c60, 
    value=...) at ../src/core/model/global-value.cc:114
#1  0x00007ffff5a31b15 in GetImpl () at ../src/core/model/simulator.cc:95
#2  0x00007ffff5a3365a in ns3::Simulator::DoSchedule (time=..., impl=0x88f360)
    at ../src/core/model/simulator.cc:224
#3  0x00007ffff5a33c4b in ns3::Simulator::Schedule (time=..., f=
    0x7ffff5a1b2a8 <ns3::Time::DeleteTimesSet()>)
    at ../src/core/model/simulator.cc:244
#4  0x00007ffff5a1bb18 in ns3::Time::TimeSet (time=0x7fffffff7cc0)
    at ../src/core/model/time.cc:215
#5  0x00007ffff5a1db39 in Time (this=0x7fffffff7cc0, value=...)
    at ../src/core/model/nstime.h:405
#6  0x00007ffff5a1dacb in ns3::Time::From (from=..., timeUnit=ns3::Time::S)
    at ../src/core/model/nstime.h:382
#7  0x00007ffff5a1da3a in ns3::Time::FromDouble (value=0.10000000000000001, 
    timeUnit=ns3::Time::S) at ../src/core/model/nstime.h:355
#8  0x00007ffff5a1dc1e in ns3::Seconds (seconds=0.10000000000000001)
    at ../src/core/model/nstime.h:613
#9  0x00007ffff5b1140e in ns3::RealtimeSimulatorImpl::GetTypeId ()
    at ../src/core/model/realtime-simulator-impl.cc:60
#10 0x00007ffff5b19acf in XRealtimeSimulatorImplRegistrationClass (this=
    0x7ffff5d8a190) at ../src/core/model/realtime-simulator-impl.cc:42
#11 0x00007ffff5b19745 in __static_initialization_and_destruction_0 (
    __initialize_p=1, __priority=65535)
    at ../src/core/model/realtime-simulator-impl.cc:42
#12 0x00007ffff5b19ab2 in global constructors keyed to realtime_simulator_impl.cc () at ../src/core/model/realtime-simulator-impl.cc:854
#13 0x00007ffff5b27df6 in __do_global_ctors_aux ()
   from /home/tomh/ns-3-allinone/ns-3-dev/build/libns3-dev-core-debug.so
#14 0x00007ffff5a0c3c3 in _init ()
   from /home/tomh/ns-3-allinone/ns-3-dev/build/libns3-dev-core-debug.so
#15 0x00007fff00000000 in ?? ()
#16 0x00007ffff7debd65 in call_init (main_map=0x943e00, argc=-170431720, argv=
    0x7fffffffe438, env=0x7fffffffe448) at dl-init.c:70
#17 _dl_init (main_map=0x943e00, argc=-170431720, argv=0x7fffffffe438, env=
    0x7fffffffe448) at dl-init.c:134
#18 0x00007ffff7df0832 in dl_open_worker (a=<value optimized out>)
    at dl-open.c:463
#19 0x00007ffff7deb9c6 in _dl_catch_error (objname=<value optimized out>, 
    errstring=<value optimized out>, mallocedp=<value optimized out>, 
    operate=<value optimized out>, args=<value optimized out>)
    at dl-error.c:178
#20 0x00007ffff7defffa in _dl_open (file=
    0x7fffffff8750 "/home/tomh/hg/dec12/ns-3-allinone/ns-3-dev/build/bindings/py

...
Comment 13 Vedran Miletić 2012-12-10 07:26:26 EST
Perhaps we could ask Gustavo if he is willing to take a look.
Comment 14 Tom Henderson 2012-12-11 14:25:00 EST
I have observed that this can also occur for non-Python programs (similar backtrace).  I've also observed one of my test platforms to suddenly start working with no crashes.  This smells to me like a static initialization order issue (dependencies between global values?).
Comment 15 Tom Henderson 2012-12-11 22:44:17 EST
The backtrace shows what is going on, which is a static initialization ordering problem triggered by this patch:

The RealtimeSimulatorImpl registration class is instantiated

#9  0x00007ffff5b1140e in ns3::RealtimeSimulatorImpl::GetTypeId ()
    at ../src/core/model/realtime-simulator-impl.cc:60
#10 0x00007ffff5b19acf in XRealtimeSimulatorImplRegistrationClass (this=
    0x7ffff5d8a190) at ../src/core/model/realtime-simulator-impl.cc:42
#11 0x00007ffff5b19745 in __static_initialization_and_destruction_0 (
    __initialize_p=1, __priority=65535)

one of its attributes requires (via MakeTimeChecker()) to eventually call the newly introduced TimeSet() method:

#4  0x00007ffff5a1bb18 in ns3::Time::TimeSet (time=0x7fffffff7cc0)
    at ../src/core/model/time.cc:215
#5  0x00007ffff5a1db39 in Time (this=0x7fffffff7cc0, value=...)
    at ../src/core/model/nstime.h:405
#6  0x00007ffff5a1dacb in ns3::Time::From (from=..., timeUnit=ns3::Time::S)
    at ../src/core/model/nstime.h:382
#7  0x00007ffff5a1da3a in ns3::Time::FromDouble (value=0.10000000000000001, 
    timeUnit=ns3::Time::S) at ../src/core/model/nstime.h:355
#8  0x00007ffff5a1dc1e in ns3::Seconds (seconds=0.10000000000000001)
    at ../src/core/model/nstime.h:613
#9  0x00007ffff5b1140e in ns3::RealtimeSimulatorImpl::GetTypeId ()
    at ../src/core/model/realtime-simulator-impl.cc:60

which causes an event to be scheduled at time zero to delete the set:

#3  0x00007ffff5a33c4b in ns3::Simulator::Schedule (time=..., f=
    0x7ffff5a1b2a8 <ns3::Time::DeleteTimesSet()>)
    at ../src/core/model/simulator.cc:244

which causes the scheduler to try to figure out what type of simulation implementation is being used (which is stored in a global value):

#0  0x00007ffff5ae30fb in ns3::GlobalValue::GetValue (this=0x7ffff5d82c60, 
    value=...) at ../src/core/model/global-value.cc:114
#1  0x00007ffff5a31b15 in GetImpl () at ../src/core/model/simulator.cc:95
#2  0x00007ffff5a3365a in ns3::Simulator::DoSchedule (time=..., impl=0x88f360)
    at ../src/core/model/simulator.cc:224

but this global value constructor has not been called yet, causing a segmentation fault.

A different type of solution needs to be found, outside of scheduling an event at time 0 from the time code.  If there was some state that is guaranteed to pass through before executing events at time 0, to which we could hook this DeleteTimeSet callback, that might work.  We talked a bit in the past about having this kind of "PreRun()" state in the simulator but AFAIK it is not supported.

It seems to me that a corollary to this is that we should document that no static objects or attributes thereof should be involved in scheduling events, to avoid this problem in the future.
Comment 16 Mathieu Lacage 2012-12-12 04:10:48 EST
(In reply to comment #15)
> The backtrace shows what is going on, which is a static initialization
> ordering problem triggered by this patch:
> 
> The RealtimeSimulatorImpl registration class is instantiated
> 
> #9  0x00007ffff5b1140e in ns3::RealtimeSimulatorImpl::GetTypeId ()
>     at ../src/core/model/realtime-simulator-impl.cc:60
> #10 0x00007ffff5b19acf in XRealtimeSimulatorImplRegistrationClass (this=
>     0x7ffff5d8a190) at ../src/core/model/realtime-simulator-impl.cc:42
> #11 0x00007ffff5b19745 in __static_initialization_and_destruction_0 (
>     __initialize_p=1, __priority=65535)
> 
> one of its attributes requires (via MakeTimeChecker()) to eventually call
> the newly introduced TimeSet() method:
> 
> #4  0x00007ffff5a1bb18 in ns3::Time::TimeSet (time=0x7fffffff7cc0)
>     at ../src/core/model/time.cc:215
> #5  0x00007ffff5a1db39 in Time (this=0x7fffffff7cc0, value=...)
>     at ../src/core/model/nstime.h:405
> #6  0x00007ffff5a1dacb in ns3::Time::From (from=..., timeUnit=ns3::Time::S)
>     at ../src/core/model/nstime.h:382
> #7  0x00007ffff5a1da3a in ns3::Time::FromDouble (value=0.10000000000000001, 
>     timeUnit=ns3::Time::S) at ../src/core/model/nstime.h:355
> #8  0x00007ffff5a1dc1e in ns3::Seconds (seconds=0.10000000000000001)
>     at ../src/core/model/nstime.h:613
> #9  0x00007ffff5b1140e in ns3::RealtimeSimulatorImpl::GetTypeId ()
>     at ../src/core/model/realtime-simulator-impl.cc:60
> 
> which causes an event to be scheduled at time zero to delete the set:
> 
> #3  0x00007ffff5a33c4b in ns3::Simulator::Schedule (time=..., f=
>     0x7ffff5a1b2a8 <ns3::Time::DeleteTimesSet()>)
>     at ../src/core/model/simulator.cc:244
> 
> which causes the scheduler to try to figure out what type of simulation
> implementation is being used (which is stored in a global value):
> 
> #0  0x00007ffff5ae30fb in ns3::GlobalValue::GetValue (this=0x7ffff5d82c60, 
>     value=...) at ../src/core/model/global-value.cc:114
> #1  0x00007ffff5a31b15 in GetImpl () at ../src/core/model/simulator.cc:95
> #2  0x00007ffff5a3365a in ns3::Simulator::DoSchedule (time=...,
> impl=0x88f360)
>     at ../src/core/model/simulator.cc:224

yes, I already pointed out to peter that it would be nicer to call a Time function from the Simulator::Run function instead. I just made this change and the segfault went away. 

changeset: 8934b7c0c1cb

> 
> but this global value constructor has not been called yet, causing a
> segmentation fault.
> 
> A different type of solution needs to be found, outside of scheduling an
> event at time 0 from the time code.  If there was some state that is
> guaranteed to pass through before executing events at time 0, to which we
> could hook this DeleteTimeSet callback, that might work.  We talked a bit in
> the past about having this kind of "PreRun()" state in the simulator but
> AFAIK it is not supported.
> 
> It seems to me that a corollary to this is that we should document that no
> static objects or attributes thereof should be involved in scheduling
> events, to avoid this problem in the future.
Comment 17 Vedran Miletić 2012-12-12 04:29:24 EST
Mathieu, did you push it to ns-3-dev?
Comment 18 Mathieu Lacage 2012-12-12 05:09:20 EST
(In reply to comment #17)
> Mathieu, did you push it to ns-3-dev?

I just did
Comment 19 Vedran Miletić 2012-12-12 06:29:18 EST
Mathieu, this is really impressive and I believe a lot of people (including me) can learn about ns-3 from this. Can you elaborate on your solution to the problem?
Comment 20 Mathieu Lacage 2012-12-12 07:39:38 EST
(In reply to comment #19)
> Mathieu, this is really impressive and I believe a lot of people (including
> me) can learn about ns-3 from this. Can you elaborate on your solution to
> the problem?

Instead of calling Schedule (to schedule an event at time zero to call TimesSetDelete) from the guts of the Time code which could be called from pretty much anywhere (included from a static constructor which is what most people who got this segfault saw), we call Time::TimesSetDelete from the Simulator::Run function when we enter it before calling the Run function from the simulator impl.

I actually renamed the Delete/Convert functions to Time::FreezeResolution while I was at it because I felt it conveyed the intend of the function better. I also and changed a couple of implementation details because it simply made more sense to me that way. The latter changes are purely subjective so I ought to be flogged for making them but it was easier for me to understand the new code.
Comment 21 Mathieu Lacage 2012-12-12 07:43:06 EST
(In reply to comment #20)
> (In reply to comment #19)
> > Mathieu, this is really impressive and I believe a lot of people (including
> > me) can learn about ns-3 from this. Can you elaborate on your solution to
> > the problem?
> 
> Instead of calling Schedule (to schedule an event at time zero to call
> TimesSetDelete) from the guts of the Time code which could be called from
> pretty much anywhere (included from a static constructor which is what most
> people who got this segfault saw), we call Time::TimesSetDelete from the
> Simulator::Run function when we enter it before calling the Run function
> from the simulator impl.
> 
> I actually renamed the Delete/Convert functions to Time::FreezeResolution
> while I was at it because I felt it conveyed the intend of the function
> better. I also and changed a couple of implementation details because it
> simply made more sense to me that way. The latter changes are purely
> subjective so I ought to be flogged for making them but it was easier for me
> to understand the new code.

And to answer the question you might ask: no, I did not know before peter checked in this change (calling Schedule) that it would break that way but I was uncomfortable about it during the code review and I felt it was more safe to call a Time function from the Simulator::Run function directly. I did not insist on this specific implementation detail because I could not see how it would break.
Comment 22 Peter Barnes 2012-12-12 13:19:52 EST
(In reply to comment #21)
> And to answer the question you might ask: no, I did not know before peter
> checked in this change (calling Schedule) that it would break that way but I
> was uncomfortable about it during the code review and I felt it was more
> safe to call a Time function from the Simulator::Run function directly. I
> did not insist on this specific implementation detail because I could not
> see how it would break.

And I resisted Mathieu's request because it made more sense to me to keep the cleanup visible within the Time code.  Live and learn.
Comment 23 Tom Henderson 2012-12-14 17:53:26 EST
After some off list discussion between Peter, Mathieu, and me, we have decided to revert the patch for ns-3.16 until issues are better understood.

In summary, the code as of changeset 8ef8d8bae350 had the following unintended side effect:  it made the Time constructor/destructor thread unsafe.  They need to be thread-safe because, otherwise, Python bindings can't call ScheduleWithContext() in a thread-safe way (Python calls this from a secondary thread).

This was manifested in bug 1554 (all python programs intermittently crash).

It may be possible to remedy this by making sure that Time::SetResolution() is called explicitly from all Python programs, but this means that scripts need to upgrade to this to work reliably; yes, we have made such API changes before but it is not clear (at least to me) whether it just works around the issue, or whether we could find another solution that just works transparently.

There were a few issues discussed:

- potentially we could put mutex/critical section on all TimeSet() operations, but this looks unattractive from a performance point of view
- Time constructors and destructors were previously inlined, and now they contain a call to a non-inlined method.  This performance impact hasn't been studied.

So, it appears that we need to rework this code again before resolving this bug.
Comment 24 Peter Barnes 2013-04-05 19:41:23 EDT
[Bug 954] Support Time::SetResolution in user code, v2

Revised patch addressing issues from Dec 2012.
Code review:  http://codereview.appspot.com/6821106/

This is roughly what was proposed at the ns-3 Developer meeting
in March 2013.

Notable changes from that discussion:

- Can't initialize a class static data member in the header, without C++11.
  Instead, I added a StaticInit () function, and force every compilation
  unit to invoke it.

- This means we can dispense with the static bool m_markingTime, and just
  use the non-nullness of the set pointer, which I renamed g_markingTimes.

- In my testing, it appears that the CriticalSection is unnecessary (no
  where else in src/core uses it either)

- The real problem appears to be static initialization order between
  Time and Simulator, triggered by the call in Time to Simulator::Schedule.
  I adopted Mathieu's suggestion of adding an explicit call in
  Simulator::Run to Time::ClearMarkedTimes.

There are still some lingering style issues (whitespace), but I didn't
want to clutter the code review with more noise.
Comment 25 Vedran Miletić 2013-04-06 04:32:30 EDT
I will take a look.
Comment 26 Vedran Miletić 2013-04-06 15:11:40 EDT
Created attachment 1551 [details]
Peter's patch rebased on top of ns-3-dev
Comment 27 Vedran Miletić 2013-04-06 15:14:45 EDT
Created attachment 1552 [details]
Add more time units and support double/integer values for all time units

We want to add support for time units such as minute, hour, day, week, year, and also allow both double and integer values for all time units.

I promised to Peter at developers meeting to take a look at this, and it seems that approach he proposed (overloading for all useful integer and floating point types) actually works. This is just a prototype showing the approach.
Comment 28 Vedran Miletić 2013-04-06 17:17:50 EDT
Created attachment 1553 [details]
Time unit patch that actually compiles and doesn't break existing tests
Comment 29 Vedran Miletić 2013-04-07 11:41:18 EDT
Created attachment 1554 [details]
Time unit patch that actually works

NS_LOG_UNCOND (Hours (1).GetSeconds ());

prints

3600
Comment 30 Vedran Miletić 2013-04-07 14:51:23 EDT
Created attachment 1555 [details]
Time units patch that actually seems to work correctly

Peter, I believe I have given your patch and time subsystem quite a bit of stretch and general stress during the development of this units stuff.

And it didn't break, which is very good.

I won't claim this is it, as I still plan to check the code in depth during this week when I catch some free time to do it.

Thanks for your work on this, hope other people will find it useful as I do.
Comment 31 Vedran Miletić 2013-04-13 03:04:59 EDT
Peter, I'm getting these

./ns3/nstime.h: At global scope:
./ns3/nstime.h:619:13: error: ‘ns3::g_TimeStaticInit’ defined but not used [-Werror=unused-variable]
cc1plus: all warnings being treated as errors

whenever there is an error somewhere in ns-3 code, and not otherwise. Is it something we should be concerned about?
Comment 32 Vedran Miletić 2013-05-30 11:47:52 EDT
Peter, could you provide information here about current status of the patch and further steps (including but not limited to reviewing the patch)?
Comment 33 Mathieu Lacage 2013-06-11 03:06:01 EDT
(In reply to comment #30)
> Created attachment 1555 [details]
> Time units patch that actually seems to work correctly
> 
> Peter, I believe I have given your patch and time subsystem quite a bit of
> stretch and general stress during the development of this units stuff.
> 
> And it didn't break, which is very good.
> 
> I won't claim this is it, as I still plan to check the code in depth during
> this week when I catch some free time to do it.
> 
> Thanks for your work on this, hope other people will find it useful as I do.

A for YEAR and WK for WEEK ? Why not 'Y', and 'W' respectively ?
Comment 34 Vedran Miletić 2013-08-15 07:24:38 EDT
(In reply to comment #33)
> A for YEAR and WK for WEEK ? Why not 'Y', and 'W' respectively ?

A per http://en.wikipedia.org/wiki/Unit_of_time and WK per intuition. But I'm fine with it either way.
Comment 35 Vedran Miletić 2013-08-15 09:02:10 EDT
Created attachment 1660 [details]
Patch rebased on top of ns-3-dev + Peter's patch set #8

Prerequisite: latest ns-3-dev and Peter's patch set #8 from https://codereview.appspot.com/6821106/ applied on it.
Comment 36 Tom Henderson 2013-08-19 02:04:40 EDT
the patch was committed in a few patches culminating in 73dfb0c88ed6

However, there are memory management problems with the mutex object that was introduced in the last changeset.  Specifically, the new'ed mutex object is never freed, and causes a valgrind error.  I tried a few approaches to keep an available reference around so it could be freed later; nothing working well so far.
Comment 37 Tom Henderson 2013-08-20 00:44:21 EDT
Peter pointed out the current code is an instance of "construct-on-first-use" idiom [1].

he suggested to try the other variant that shifts the risk to static deinitialization [2]; namely:

diff -r 8dc92cb1e737 src/core/model/time.cc
--- a/src/core/model/time.cc	Mon Aug 19 18:30:18 2013 +0200
+++ b/src/core/model/time.cc	Mon Aug 19 21:30:37 2013 -0700
@@ -48,8 +48,8 @@
 SystemMutex &
 GetMarkingMutex ()
 {
-  static SystemMutex * g_markingMutex = new SystemMutex;
-  return *g_markingMutex;
+  static SystemMutex g_markingMutex;
+  return g_markingMutex;
 }

This does clear the issue for cases that involve calling Simulator::Run().  However, we have a lot of examples and unit tests that do not call Simulator::Run(), and hence never call Time::ConvertTimes(), and hence g_markingTimes is never deleted, resulting still in valgrind errors for those examples and tests.


[1] http://www.parashift.com/c++-faq/static-init-order-on-first-use.html
[2] http://www.parashift.com/c++-faq/construct-on-first-use-v2.html
Comment 38 Tommaso Pecorella 2013-08-20 04:15:34 EDT
If this fixes the issue, then I'd say to add Simulator::Run() to the missing examples.

As a matter of fact, I find a little bit strange to have an example without Simulator::Run(). For the tests, we can add it as well.

Plus, we should make it clear in the documentation that Simulator::Run() *must* be called.
Comment 39 Peter Barnes 2013-08-20 13:13:28 EDT
I'd prefer a less dogmatic solution.  CommandLine, for example, should be perfectly useful, without errors, without any reference to Time (for a "mandatory" call to SetResolution), or Simulator for a "mandatory" call to Simulator::Run.
Comment 40 Vedran Miletić 2013-08-20 13:15:11 EDT
(In reply to comment #39)
> I'd prefer a less dogmatic solution.  CommandLine, for example, should be
> perfectly useful, without errors, without any reference to Time (for a
> "mandatory" call to SetResolution), or Simulator for a "mandatory" call to
> Simulator::Run.

+1

Can we avoid making the variable static?
Comment 41 Peter Barnes 2013-08-21 16:32:28 EDT
Here is a combined patch for the mutex and g_markingTimes valgrind hits, but ...

As Tom said, I have to think through the implications of doing static init this way.  It's allowed, but potentially introduces static de-initialization order bugs. (C++ FAQ 10.16) [2]

Let's apply this to the ns-3-18 rc and see if it fixes valgrind and doesn't introduce other problems, while I cogitate on the plane today.


diff -r 73dfb0c88ed6 src/core/model/time.cc
--- a/src/core/model/time.cc    Fri Aug 16 11:10:19 2013 -0700
+++ b/src/core/model/time.cc    Tue Aug 20 10:08:17 2013 -0700
@@ -48,8 +48,8 @@
 SystemMutex &
 GetMarkingMutex ()
 {
-  static SystemMutex * g_markingMutex = new SystemMutex;
-  return *g_markingMutex;
+  static SystemMutex g_markingMutex;
+  return g_markingMutex;
 }
 
 
@@ -65,7 +65,8 @@
     {
       if (! g_markingTimes)
         {
-          g_markingTimes = new Time::MarkedTimes;
+          static MarkedTimes markingTimes;
+          g_markingTimes = & markingTimes;
         }
       else
         {
@@ -234,7 +235,7 @@
   if (g_markingTimes)
     {
       NS_LOG_LOGIC ("clearing MarkedTimes");
-      delete g_markingTimes;
+      g_markingTimes->erase (g_markingTimes->begin(), g_markingTimes->end ());
       g_markingTimes = 0;
     }
 }  // Time::ClearMarkedTimes
@@ -327,7 +328,7 @@
   // Body of ClearMarkedTimes
   // Assert above already guarantees g_markingTimes != 0
   NS_LOG_LOGIC ("clearing MarkedTimes");
-  delete g_markingTimes;
+  g_markingTimes->erase (g_markingTimes->begin(), g_markingTimes->end ());
   g_markingTimes = 0;
 
 }  // Time::ConvertTimes ()
Comment 42 Tom Henderson 2013-08-21 19:29:05 EDT
(In reply to comment #41)
> Here is a combined patch for the mutex and g_markingTimes valgrind hits, but
> ...
> 
> As Tom said, I have to think through the implications of doing static init this
> way.  It's allowed, but potentially introduces static de-initialization order
> bugs. (C++ FAQ 10.16) [2]
> 
> Let's apply this to the ns-3-18 rc and see if it fixes valgrind and doesn't
> introduce other problems, while I cogitate on the plane today.

pushed as changeset 2564598017c1

Let's keep this open until if/when we agree that the issue is settled.
Comment 43 Vedran Miletić 2013-10-12 20:31:12 EDT
What's the final verdict (if any) on whether this bug should be closed?
Comment 44 Peter Barnes 2018-10-12 12:58:19 EDT
Is it time to close this issue?  It's been in the mainline since 2013, with no issues.
Comment 45 Tom Henderson 2018-10-12 14:21:18 EDT
I am OK to close this based on stability of the patch being fine, but do you think we should regression test at different resolution?

From my perspective, changing default from NS to PS is the interesting (sufficient?) case to check.

I just hand-edited time.cc to change the default to PS and almost all tests passed, and all examples passed:

List of FAILed tests:
    devices-mesh-dot11s-regression
    mobility-trace
    time

These are explainable (the first two have test vectors assuming the NS resolution).

But we don't have a global value to control resolution so as to run './test.py' with a different resolution.  Should we care?
Comment 46 Peter Barnes 2019-11-04 17:26:55 EST
Created GitLab issue #104 

https://gitlab.com/nsnam/ns-3-dev/issues/104

to capture Tom's follow up tasks: Support regression testing with different Time resolutions.