Bug 1410 - Assert in DefaultSimulatorImpl breaks Visualizer module
Assert in DefaultSimulatorImpl breaks Visualizer module
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: core
ns-3-dev
All All
: P5 critical
Assigned To: Zack Weinberg
:
: 1417 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-18 11:52 EDT by Alex Afanasyev
Modified: 2012-05-19 03:48 EDT (History)
4 users (show)

See Also:


Attachments
patch (2.55 KB, patch)
2012-04-22 20:58 EDT, Zack Weinberg
Details | Diff
patch to make the visualizer work again (2.83 KB, patch)
2012-04-24 10:04 EDT, Mathieu Lacage
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Afanasyev 2012-04-18 11:52:05 EDT
Assert on line 209 of src/core/model/default-simulator-impl.cc  makes visualizer module unusable. 

Although I understand that this asserts suppose to prevent event scheduling from non-primary thread, visualizer relies on a separate thread to schedule events...  Probably there should be some changes in VisualSimulatorImpl, but I'm not sure what exactly should be changed.
Comment 1 Mathieu Lacage 2012-04-18 13:06:34 EDT
Someone is calling Schedule when it should be calling ScheduleWithContext. If you could post a backtrace when the assert fails, we would know who is the culprit.
Comment 2 Alex Afanasyev 2012-04-18 13:34:04 EDT
OK. Here is the backtrace:

--- begin cut ---

(gdb) thread apply all bt

Thread 4 (process 23056):
#0  ns3::DefaultSimulatorImpl::Schedule (this=0x104767be0, time=@0x108f9d3b0, event=0x109200170) at ../src/core/model/default-simulator-impl.cc:214
#1  0x0000000100e3cdbe in ns3::VisualSimulatorImpl::Schedule (this=0x104767ba0, time=@0x108f9d3b0, event=0x109200170) at ../src/visualizer/model/visual-simulator-impl.cc:145
#2  0x00000001031db38c in ns3::Simulator::DoSchedule (time=@0x108f9d3b0, impl=0x109200170) at ../src/core/model/simulator.cc:216
#3  0x0000000100e3c5ed in ns3::Simulator::Schedule<void (ns3::PyViz::*)(), ns3::PyViz*> (time=@0x108f9d3b0, mem_ptr={__pfn = 0x100e1e97e <ns3::PyViz::CallbackStopSimulation()>, __delta = 0}, obj=0x108504600) at simulator.h:822
#4  0x0000000100e1e84a in ns3::PyViz::SimulatorRunUntil (this=0x108504600, time=@0x108f9d4e0) at ../src/visualizer/model/pyviz.cc:321
#5  0x000000010573a9f0 in _wrap_PyNs3PyViz_SimulatorRunUntil (self=0x105040470, args=0x108492250, kwargs=0x0) at src/visualizer/bindings/ns3module.cc:605
#6  0x00000001043011f3 in PyEval_EvalFrameEx ()
... skipped
#17 0x0000000104306b01 in PyEval_CallObjectWithKeywords ()
#18 0x000000010433ca22 in t_bootstrap ()
#19 0x00007fff845ed8bf in _pthread_start ()
#20 0x00007fff845f0b75 in thread_start ()

Thread 1 (process 23056):
#0  0x00007fff8460b988 in szone_realloc ()
#1  0x00007fff8463f243 in malloc_zone_realloc ()
#2  0x00007fff84640032 in realloc ()
... skipped
#31 0x0000000104326bd4 in PyRun_SimpleStringFlags ()
#32 0x0000000100e3d25b in ns3::VisualSimulatorImpl::Run (this=0x104767ba0) at ../src/visualizer/model/visual-simulator-impl.cc:121
#33 0x00000001031dc0bc in ns3::Simulator::Run () at ../src/core/model/simulator.cc:156
#34 0x0000000100003b93 in main (argc=2, argv=0x7fff5fbff370) at ../scratch/ccnx-grid.cc:124

--- end cut ---

So, the problem is nailed down to src/visualizer/model/pyviz.cc:321, where there is Schedule that is performed from GUI thread. 
Unfortunately, I cannot just replace it with ScheduleWithContext.   The reason, which bothered me several times already, is in the fact that when you schedule without context, you can get EventId and later cancel the event if necessary.   When ScheduleWithContext, you're not getting EventId and have no means to cancel the event.  I didn't look into detail, but is it something fundamental?  Why ScheduleWithContext returns void?

Is there any other way to solve src/visualizer/model/pyviz.cc:321 ?
Comment 3 Mathieu Lacage 2012-04-18 18:13:28 EDT
(In reply to comment #2)
> So, the problem is nailed down to src/visualizer/model/pyviz.cc:321, where
> there is Schedule that is performed from GUI thread. 
> Unfortunately, I cannot just replace it with ScheduleWithContext.   The reason,
> which bothered me several times already, is in the fact that when you schedule
> without context, you can get EventId and later cancel the event if necessary.  
> When ScheduleWithContext, you're not getting EventId and have no means to
> cancel the event.  I didn't look into detail, but is it something fundamental? 

yes, unfortunately. We do not want to allow multi-threaded access to the EventId: that would cost too much performance wise.

> Why ScheduleWithContext returns void?
> 
> Is there any other way to solve src/visualizer/model/pyviz.cc:321 ?

Without looking at the code, I would say that you would need to figure out how to rewrite the code to not need the EventId.
Comment 4 Alex Afanasyev 2012-04-18 18:20:15 EDT
Then I have a question about Schedule/ScheduleWithContext in general. What is this context and what is its relation to threads?   I kind of guessed that in network simulations 'context' is ID of the node, but it is not related to threads...

Can we just use some critical sections/mutexes to protect the scheduler, in case if events are scheduled/cancelled from different threads?
Comment 5 Zack Weinberg 2012-04-22 20:58:30 EDT
Created attachment 1387 [details]
patch

I hope you don't mind my jumping in, but I need pyviz to work for my project and it looks like a relatively easy fix: instead of having the PyViz object hold onto an event ID it gets back from Schedule, have it fabricate an EventImpl object itself and pass that to ScheduleWithContext.  Then it can directly cancel the event, later.

I don't understand what the "context" argument to ScheduleWithContext is for, so I just pass zero, which appears to work fine.  Also I may not have understood how to use your smart pointers correctly.  Other than that this seems like a safe fix.
Comment 6 Gustavo J. A. M. Carneiro 2012-04-24 05:50:01 EDT
Guys, I haven't looked in detail whether pyviz could be modified to adapt.  However, it is clear to me that the new assertions added to the scheduler, block out perfectly legal usage, and so, IMHO, they should simply be removed.  PyViz accesses the ns-3 API from multiple threads, but it uses a mutex to make sure that only one thread accesses ns-3 at any given time.
Comment 7 Gustavo J. A. M. Carneiro 2012-04-24 05:50:43 EDT
*** Bug 1417 has been marked as a duplicate of this bug. ***
Comment 8 Mathieu Lacage 2012-04-24 08:37:39 EDT
(In reply to comment #5)
> Created attachment 1387 [details]
> patch
> 
> I hope you don't mind my jumping in, 

Not at all

> but I need pyviz to work for my project

good.

> and it looks like a relatively easy fix: instead of having the PyViz object
> hold onto an event ID it gets back from Schedule, have it fabricate an
> EventImpl object itself and pass that to ScheduleWithContext.  Then it can
> directly cancel the event, later.

This probably works with the default scheduler but it is pushing the boundary of what the API was designed for and it will certainly not work with other kinds of schedulers.

I am really not sure what this stop event is here for.

> 
> I don't understand what the "context" argument to ScheduleWithContext is for,
> so I just pass zero, which appears to work fine.  Also I may not have

you lose :)

0xffffffff is the magic value you want.
Comment 9 Mathieu Lacage 2012-04-24 08:49:44 EDT
(In reply to comment #6)
> Guys, I haven't looked in detail whether pyviz could be modified to adapt. 
> However, it is clear to me that the new assertions added to the scheduler,
> block out perfectly legal usage, and so, IMHO, they should simply be removed. 

What we are trying to do here is to make sure that Schedule cannot be called from a thread while ::Run is running in another thread. The current assert is a simple way to catch this illegal usage of the API. It does also catch legal uses as you point out but I do not see any simple way to change the assert to handle this case. 

> PyViz accesses the ns-3 API from multiple threads, but it uses a mutex to make
> sure that only one thread accesses ns-3 at any given time.

Right. It is simply hard for us to know this for sure. i.e., If we could 'clear' m_main when ::Run returns, it would work for you but I do not see how this would be done in a portable and efficient way.
Comment 10 Mathieu Lacage 2012-04-24 10:04:27 EDT
Created attachment 1389 [details]
patch to make the visualizer work again
Comment 11 Gustavo J. A. M. Carneiro 2012-04-24 12:34:41 EDT
(In reply to comment #10)
> Created attachment 1389 [details]
> patch to make the visualizer work again

If it works, +1.
Comment 12 Gustavo J. A. M. Carneiro 2012-04-26 10:00:02 EDT
(In reply to comment #11)
> (In reply to comment #10)
> > Created attachment 1389 [details]
> > patch to make the visualizer work again
> 
> If it works, +1.

Unfortunately, it doesn't work.  When I hit the run button, I get:


Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/home/gjc/projects/ns/ns-3-allinone/ns-3-dev/src/visualizer/visualizer/core.py", line 1483, in start
    assert Visualizer.INSTANCE is None
AssertionError
Comment 13 Tom Henderson 2012-05-11 01:24:38 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Created attachment 1389 [details]
> > > patch to make the visualizer work again
> > 
> > If it works, +1.
> 
> Unfortunately, it doesn't work.  When I hit the run button, I get:
> 
> 
> Traceback (most recent call last):
>   File "<string>", line 2, in <module>
>   File
> "/home/gjc/projects/ns/ns-3-allinone/ns-3-dev/src/visualizer/visualizer/core.py",
> line 1483, in start
>     assert Visualizer.INSTANCE is None
> AssertionError


Is anyone working this bug now?
Comment 14 Gustavo J. A. M. Carneiro 2012-05-15 13:08:22 EDT
Just needed to fix Mathieu's patch, since he had "cleaned up" some code that was actually important!

changeset:   8762:407d9a51cae8
tag:         tip
user:        Gustavo J. A. M. Carneiro  <gjc@inescporto.pt>
date:        Tue May 15 18:07:05 2012 +0100
summary:     Bug 1410 - Assert in DefaultSimulatorImpl breaks Visualizer module.
Comment 15 Mathieu Lacage 2012-05-19 03:48:40 EDT
(In reply to comment #14)
> Just needed to fix Mathieu's patch, since he had "cleaned up" some code that
> was actually important!
> 
> changeset:   8762:407d9a51cae8
> tag:         tip
> user:        Gustavo J. A. M. Carneiro  <gjc@inescporto.pt>
> date:        Tue May 15 18:07:05 2012 +0100
> summary:     Bug 1410 - Assert in DefaultSimulatorImpl breaks Visualizer
> module.

sorry about that, thanks !