Bug 631

Summary: RealtimeSimulatorImpl does not handle Ctrl-C with python bindings
Product: ns-3 Reporter: Tom Henderson <tomh>
Component: python bindingsAssignee: Mathieu Lacage <mathieu.lacage>
Status: RESOLVED FIXED    
Severity: normal CC: aquereilhac, craigdo, faker.moatamri, gjcarneiro, mathieu.lacage, ns-bugs, vkawadia
Priority: P3    
Version: ns-3-dev   
Hardware: PC   
OS: Linux   
Bug Depends on:    
Bug Blocks: 603    

Description Tom Henderson 2009-07-08 16:34:54 EDT
Realtime simulations do not presently work from a python program.  The reason seems to be that in ns3module_helpers.cc, the function _wrap_Simulator_Run() calls IsFinished()/RunOneEvent(), which is different from C++ Simulator::Run().  

I understand that RunOneEvent() is there to allow pyviz and other GUIs to step through simulation execution, but if someone is to use RunOneEvent() with realtime, the responsibility lies in the caller of RunOneEvent() to maintain a realtime clock and perform synchronization.  This is documented in RealtimeSimulatorImpl::RunOneEvent().

A possible solution is to detect the presence of this implementation type and behave like Simulator::Run() when RealtimeSimulatorImpl() is selected.
Comment 1 Gustavo J. A. M. Carneiro 2009-07-08 17:21:52 EDT
The reason why RunOneEvent is used is to allow signals to be detected by python during a simulation.  Otherwise when you hit Ctrl-C Python will not notice the interrupt probably until the simulation ends.
Comment 2 Gustavo J. A. M. Carneiro 2009-07-08 17:25:50 EDT
Calling ns3.Simulator.Run(signal_check_frequency=-1) will cause the wrapper to call ns3::Simulator::Run() thereby avoiding the problem.
Comment 3 Tom Henderson 2009-07-13 00:18:55 EDT
(In reply to comment #2)
> Calling ns3.Simulator.Run(signal_check_frequency=-1) will cause the wrapper to
> call ns3::Simulator::Run() thereby avoiding the problem.
> 

This does remedy the problem, at the expense of disabling Control-C.  Is there any hope of a special control-c handler for this case?
Comment 4 Mathieu Lacage 2009-07-13 03:26:08 EDT
I think that a better solution here is simply to declare Simulator::RunOneEvent + Simulator::IsFinished dead and force GUIs to use a thread to run ns-3: RunOneEvent is extremly complex to implement for anything but the default simulator implementation (i.e., parallel simulators are stuck).

If we required the use of threads, we need to figure out how to allow users to still interact with the simulation from another thread. i.e., we need to have some amount of thread-safety built-in. One possible commitment we could make would be to make the simulator.h header fully thread-safe (or, mostly thread-safe) to allow a GUI thread to insert events in the simulator to perform queries in the simulation objects from the simulation thread and then return them to the GUI through a user-specific queue.

The only issue I feel concerned about is that of performance. What is the cost of locking a simulator-wide mutex on event scheduling micro-benchmarks ?
Comment 5 Gustavo J. A. M. Carneiro 2009-07-13 06:56:11 EDT
IMHO we should avoid threads in non-realtime simulator due to performance issues and portability issues.  I think the RunOne solution is so simple, why don't you like it?  It is surely a lot simpler than any threaded solution you may want to throw at it.

The realtime scheduler problem is a different one.  But a solution to it is not trivial.  I think we should just add some way to find out whether the implementation supports RunOne and avoid calling it if not.  Naturally if we do not call RunOne then we'll not catch Ctrl-C, but at least the simulator will work from python automagically.

After that we can try to get Ctrl-C to work again, maybe using threads as you suggest, but to be honest I think it is more work than the feature is worth...
Comment 6 Mathieu Lacage 2009-07-13 07:23:16 EDT
(In reply to comment #5)
> IMHO we should avoid threads in non-realtime simulator due to performance
> issues and portability issues.  I think the RunOne solution is so simple, why
> don't you like it?  It is surely a lot simpler than any threaded solution you
> may want to throw at it.

Because it makes it impossible to support the realtime scheduler, the mpi parallel scheduler, and, the threaded parallel scheduler. To summarize, nothing but the simplest default simulator works with RunOne. So, yes, it's simple, but, it kills our ability to do fancy things underneath: it is exposing too much from the event loop so, I would rather kill it.

> The realtime scheduler problem is a different one.  But a solution to it is not
> trivial.  I think we should just add some way to find out whether the
> implementation supports RunOne and avoid calling it if not.  Naturally if we do
> not call RunOne then we'll not catch Ctrl-C, but at least the simulator will
> work from python automagically.
> 
> After that we can try to get Ctrl-C to work again, maybe using threads as you
> suggest, but to be honest I think it is more work than the feature is worth...

I have other features in mind.
Comment 7 Mathieu Lacage 2009-07-13 07:31:41 EDT
(In reply to comment #6)
> > The realtime scheduler problem is a different one.  But a solution to it is not
> > trivial.  I think we should just add some way to find out whether the
> > implementation supports RunOne and avoid calling it if not.  Naturally if we do
> > not call RunOne then we'll not catch Ctrl-C, but at least the simulator will
> > work from python automagically.
> > 
> > After that we can try to get Ctrl-C to work again, maybe using threads as you
> > suggest, but to be honest I think it is more work than the feature is worth...

what I wanted to point out but forgot to is that it should be perfectly possible to add a Simulator::SetMonitoringCallback (Callback<void> cb);
which is called regularly by the simulation code and you could call PyErr_CheckSignals from that callback. [other variants on this theme are easy to code]

The key here is that it allows the simulator code much more leeway in doing what it needs to do.


Comment 8 Gustavo J. A. M. Carneiro 2009-07-13 07:32:05 EDT
(In reply to comment #6)
> (In reply to comment #5)
> > IMHO we should avoid threads in non-realtime simulator due to performance
> > issues and portability issues.  I think the RunOne solution is so simple, why
> > don't you like it?  It is surely a lot simpler than any threaded solution you
> > may want to throw at it.
> 
> Because it makes it impossible to support the realtime scheduler, the mpi
> parallel scheduler, and, the threaded parallel scheduler. To summarize, nothing
> but the simplest default simulator works with RunOne. So, yes, it's simple,
> but, it kills our ability to do fancy things underneath: it is exposing too
> much from the event loop so, I would rather kill it.

It does not make it impossible to do anything, you are exaggerating.  For the other schedulers we can use Simulator::Run and not catch signals.  Not being able to catching signals in Python is not a great loss IMHO.

Maybe what we want is to remove RunOne as a generic API and make it specific to the default simulator implementation.  That's what I would do, if not for your mania of having huge ABCs with many pure virtual methods that have to be implemented even if they make no sense for some subclasses (see also class NetDevice).
Comment 9 Gustavo J. A. M. Carneiro 2009-07-13 08:49:16 EDT
(In reply to comment #7)
> (In reply to comment #6)
> > > The realtime scheduler problem is a different one.  But a solution to it is not
> > > trivial.  I think we should just add some way to find out whether the
> > > implementation supports RunOne and avoid calling it if not.  Naturally if we do
> > > not call RunOne then we'll not catch Ctrl-C, but at least the simulator will
> > > work from python automagically.
> > > 
> > > After that we can try to get Ctrl-C to work again, maybe using threads as you
> > > suggest, but to be honest I think it is more work than the feature is worth...
> 
> what I wanted to point out but forgot to is that it should be perfectly
> possible to add a Simulator::SetMonitoringCallback (Callback<void> cb);
> which is called regularly by the simulation code and you could call
> PyErr_CheckSignals from that callback. [other variants on this theme are easy
> to code]

Calling PyErr_CheckSignals for _every_ scheduler iteration is bad for performance, because you also have to acquire the Python GIL (Global Interpreter Lock).  The current code does this every 100 events by default.

Also without RunOne you will make me jump through hoops in order to get real-time visualization to work (assuming I will have time one day to port ns-3-pyviz to ns-3-dev).  Not nice.

> 
> The key here is that it allows the simulator code much more leeway in doing
> what it needs to do.
> 

All I ask is to keep the default (and most important one!) scheduler to work as before.  Forget about parallel/realtime/whatever, those are cool but not exactly what most people are going to use.  I think you are giving it excessive importance.  No offense intended to whoever is putting hard work into implementing them! :)
Comment 10 Mathieu Lacage 2009-07-13 09:02:38 EDT
(In reply to comment #9)
> > what I wanted to point out but forgot to is that it should be perfectly
> > possible to add a Simulator::SetMonitoringCallback (Callback<void> cb);
> > which is called regularly by the simulation code and you could call
> > PyErr_CheckSignals from that callback. [other variants on this theme are easy
> > to code]
> 
> Calling PyErr_CheckSignals for _every_ scheduler iteration is bad for
> performance, because you also have to acquire the Python GIL (Global
> Interpreter Lock).  The current code does this every 100 events by default.

I would expect you (callee) to implement a counter and avoid costly checks based on the value of this counter.

> Also without RunOne you will make me jump through hoops in order to get
> real-time visualization to work (assuming I will have time one day to port
> ns-3-pyviz to ns-3-dev).  Not nice.

Ok, so, what are the requirements of ns-3-pyviz ? What would be the problem if you used the above-mentioned callback to perform regular runs of the GTK/QT mainloop ?

> > The key here is that it allows the simulator code much more leeway in doing
> > what it needs to do.
> > 
> 
> All I ask is to keep the default (and most important one!) scheduler to work as
> before.  Forget about parallel/realtime/whatever, those are cool but not
> exactly what most people are going to use.  I think you are giving it excessive
> importance.  No offense intended to whoever is putting hard work into
> implementing them! :)

If these simulators can't implement all the methods from src/simulator/simulator.h, then, yes, people will not use them very much but this is a self-fulfilling profecy. 

What I am trying to do is to make sure _every_ simulator can replace the other simulators (it's not the case, already). And I am not talking about removing functionality to do this. I am merely trying to find out what is the proper API we can expose which gives you the functionality you need and which is implementable correctly by every simulator.
Comment 11 Craig Dowell 2009-07-13 15:02:17 EDT
I agree with Gustavo in that there a simple case that we should make very simple to use.  Using the default simulator implementation and calling in to run one event seems to be a nice approach -- it seems to be to be such a nice mechanism for GUIs that I think we should resist taking it away.  I agree with Mathieu wholeheartedly in that RunOneEvent just isn't reasonable for anthing but the simplest case, though.  There are alternatives like scheduling periodic callbacks, but then you immediately run into complication either at the low level since you lose control for arbitrary lengths of time, or at hgih level since the callback comes back in the context of the calling thread. 

I hate to bring it up (okay, I don't), but I think we are really swerving into the question of how to deal with capabilities.  For the NetDevice we have degenerated into a bunch of pure virtual methods asking capabilities questions like IsBroadcast, IsMulticast, IsBridge, IsPointToPoint (that are sometimes mixed in with state questions IsLinkUp).  Using this approach you would do something like

  if (nd->IsBlah ())
    {
      nd->DoSomething ();
    }

I hope we don't start adding similar things to Simulator.  I originally wanted to give QueryObject powers to the simulator to resolve some of this kind of issue.  A similar thing came up in the context of getting the actual underlying real time clock and scheduling real time events.  A compromise was reached and we gave clients the ability to get and cast a pointer to the underlying implementation.  You can see this in the (not very robust code from) emu device,

  DynamicCast<RealtimeSimulatorImpl> (
    Simulator::GetImplementation ())->ScheduleRealtimeNow (
      MakeEvent (&EmuNetDevice::ForwardUp, this, buf, len));

One way to deal with this problem is to QueryObject for a single event "Interface" and use it if you need it.  The default simulator implementation could aggregate an interface for single stepping the simulation, while the more complex implementations could decide based on their own issues whether or not to provide it.  

  pi = nd->QueryObject<SimulatorSingleStepper> ();

  ...

  if (pi)
    {
      pi->RunOneEvent ();
    }

If we do it this way, we should think about making all of our similar capabilities checks work the same way (i.e. net device)

Another way to deal with this issue is through some form of traits, templated or inherited.  Perhaps

  if (ns3::simulator_capabilities<RealTimeSimulatorTraits>::CanRunOneEvent ())
    {
       Simulator::RunOneEvent ();
    }

This requires the client to know the concrete type of the class, and keep the trait queries coherent with the current implementation.

If we inherit capabilities classes (ADT) then we end up with something like

  if (ns3::Simulator::Capabilities::CanRunOneEvent ())
    {
       Simulator::RunOneEvent ();
    }

But then you have to deal with plumbing the capabilities from the Simulator wrapper down into the implementation, which I hate.  We just got done pulling a lot of that nonsense out of Ipv4, so it should really be more like the emu example above

  pi = Simulator::GetImplementation ();
  if (pi->Capabilities::CanRunOneEvent ())
    {
       Simulator::RunOneEvent ();
    }

But if you think about it, this is quite similar to a QI but lacks the clean-ness of the interface separation.

Anyway, we are at the point again, where contemplating this stuff always seems like too much work to get feature X alone working, but if we don't deal with it, we end up with loads of silly and annoying IsBlah () and IsYadda () functions that need to be plumbed and implemented constantly.

It might not surprise most of you, but I like the QueryObject option.
Comment 12 Mathieu Lacage 2009-07-13 15:29:39 EDT
(In reply to comment #11)
> I agree with Gustavo in that there a simple case that we should make very
> simple to use.  Using the default simulator implementation and calling in to

Writing a GUI is not what I could call a "simple case". The common case is writing a simulation script, not an interactive simulation GUI. I think that it is very reasonable to ask a GUI developer to use a special API which is not _the most simple API_ to use but which is still ok. A callback seems really very reasonable, unless, of course, someone can point out a fundamental flaw with that idea.

> run one event seems to be a nice approach -- it seems to be to be such a nice
> mechanism for GUIs that I think we should resist taking it away.  I agree with
> Mathieu wholeheartedly in that RunOneEvent just isn't reasonable for anthing
> but the simplest case, though.  There are alternatives like scheduling periodic
> callbacks, but then you immediately run into complication either at the low
> level since you lose control for arbitrary lengths of time, or at hgih level
> since the callback comes back in the context of the calling thread. 

Potentially failing to work in some cases where the user has abused the monitoring callback seems much less problematic than having ::RunOneEvent not work at all.

[IsBlah ()]

> I hope we don't start adding similar things to Simulator.  I originally wanted

*shudder*. I feel confident: "over my dead body" :)

> to give QueryObject powers to the simulator to resolve some of this kind of
> issue.  A similar thing came up in the context of getting the actual underlying
> real time clock and scheduling real time events.  A compromise was reached and
> we gave clients the ability to get and cast a pointer to the underlying
> implementation.  You can see this in the (not very robust code from) emu
> device,
> 
>   DynamicCast<RealtimeSimulatorImpl> (
>     Simulator::GetImplementation ())->ScheduleRealtimeNow (
>       MakeEvent (&EmuNetDevice::ForwardUp, this, buf, len));

Simulator::Schedule* needs to be thread-safe to get rid of this, I think, right ?. i.e., we need to profile whether the cost of taking a lock in each method of DefaultSimulatorImpl and making EventImpl::Ref/Unref threadsafe is not a performance problem (I have local patches for the latter). If not, then, we should kill this ugly code (which I am responsible for, I know).

> One way to deal with this problem is to QueryObject for a single event
> "Interface" and use it if you need it.  The default simulator implementation
> could aggregate an interface for single stepping the simulation, while the more
> complex implementations could decide based on their own issues whether or not
> to provide it.  

[capabilities]

> > But if you think about it, this is quite similar to a QI but lacks the
> clean-ness of the interface separation.

I have a very hard time understanding how this is any different from using Simulator::GetImplementation ()->GetObject<MyCapability> () which I could live with but which, really, I would like to avoid.
Comment 13 Craig Dowell 2009-07-13 17:02:28 EDT
> Writing a GUI is not what I could call a "simple case".

Not that it's super-important, but I'm not suggesting that writing a GUI is simple.  I'm suggesting that we should continue to support a simple way to single step the simulation, which in turn, makes this kind of application easier to write.  We have a user suggesting that this makes his life easier, and I can see his point clearly.

[ ... ]

>>   DynamicCast<RealtimeSimulatorImpl> (
>>     Simulator::GetImplementation ())->ScheduleRealtimeNow (
>>       MakeEvent (&EmuNetDevice::ForwardUp, this, buf, len));
>
> Simulator::Schedule* needs to be thread-safe to get rid of this,
> I think, right?. 

Yes.  These methods are thread-safe in the realtime simulator, but not in the default case.  However, IMO, having ScheduleReal method along side of Schedule methods is dangerous since there are side-effects that may bite users (cf, non-discrete-event simulator anymore makes certain models stop working).

>i.e., we need to profile whether the cost of taking a lock 
> in each method of DefaultSimulatorImpl and making 
> EventImpl::Ref/Unref threadsafe is not a performance problem
> (I have local patches for the latter). If not, then, we
> should kill this ugly code (which I am responsible for, I know).

Having thread safe methods seems like a good idea, but it probably means a kernel call in the MP case.  We might be able to get away with a TAS instruction optimization, but this is pretty much the definition of non-portable :-)

I still think the realtime versions should be broken out so that non-real-time simulations can't be lulled into thinking they are or doing something possibly completely unexpected.  It makes sense to me to break out the single step stuff.

[ ... ]

> A callback seems really very reasonable, unless, of course, someone 
> can point out a fundamental flaw with that idea.

I always thought that these callbacks could be scheduled just like any other event, BTW.  With the realtime scheduler, you could just arrange to have yourself called back every 100ms, for example.

If you write an event loop with a message pump, your callback can just send a message to the event loop, but this implies a thread for running the message pump and a thread for calling into the simulator.  So you begin talking multithreaded pretty quickly.

If you just have a single step method, you can call it out of your event loop and don't have to deal with multithreading.

It's just easier to deal with, IMO.

> > > But if you think about it, this is quite similar to a QI but lacks the
> > clean-ness of the interface separation.

> I have a very hard time understanding how this is any different 
> from using Simulator::GetImplementation ()->GetObject<MyCapability> () 
> which I could live with but which, really, I would like to avoid.

Like I said, it's "quite similar."  The difference is that you can share one interface class across simulator implelemtations and then don't have a dependency on the concrete class of the implementation in the client.  More than one concrete implementation class can provide a single SingleStep interface and the client doesn't have to know every possible concrete class to work with multiple of them.
Comment 14 Gustavo J. A. M. Carneiro 2010-08-09 12:42:34 EDT
*** Bug 974 has been marked as a duplicate of this bug. ***
Comment 15 Gustavo J. A. M. Carneiro 2010-08-10 07:17:09 EDT
Until a better solution comes up, I pushed this:

changeset:   6541:2f20df778b76
tag:         tip
user:        Gustavo J. A. M. Carneiro  <gjc@inescporto.pt>
date:        Tue Aug 10 12:16:10 2010 +0100
summary:     Python: only call ns3::Simulator::RunOne if we're running the default event scheduler; partially fixes Bug 631.
Comment 16 Gustavo J. A. M. Carneiro 2011-03-27 09:54:02 EDT
Re-titling and lowering priority because what we have now is that real-time scheduler can be used from python, Simulator::RunOne is not used by default; only keyboard interrupt does not work in this setup.  I don't think this is serious (pygtk had this same problem for many many years).
Comment 17 alina 2011-07-19 09:05:33 EDT
A patch for this problem has been proposed in bug #1000 attachment #1197 [details]
Comment 18 Mathieu Lacage 2012-03-26 06:18:16 EDT
changeset: c1740487ced6
Comment 19 Gustavo J. A. M. Carneiro 2012-04-23 10:42:39 EDT
I think this introduced a regression: running python simulation scripts with visualizer fails:

assert failed. cond="SystemThread::Equals (m_main)", msg="Simulator::Schedule Thread-unsafe invocation!", file=../src/core/model/default-simulator-impl.cc, line=209
terminate called without an active exception
Command ['/usr/bin/python', 'src/flow-monitor/examples/wifi-olsr-flowmon.py'] terminated with signal SIGIOT. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").
Comment 20 Gustavo J. A. M. Carneiro 2012-04-23 10:47:58 EDT
Nevermind, it was probably the patch that added "thread-safety".  It only allows the "main thread" to run simulations?  Visualizer runs the simulation in another thread...
Comment 21 alina 2012-04-23 11:24:48 EDT
(In reply to comment #20)
> Nevermind, it was probably the patch that added "thread-safety".  It only
> allows the "main thread" to run simulations?  Visualizer runs the simulation in
> another thread...

Hello Gustavo,

ScehduleWithContext is the only schedule method that is thread-safe. The assert you mention was added to the rest of the schedule methods to prevent them from being invoked from a thread other than the thread that processes the events, and thus avoid thread-safety issues.

Would it be possible to change the behavior of the Visualizer to make sure that scheduling from a different thread than the one running the simulator, is always done using ScheduleWithContext?
Comment 22 Gustavo J. A. M. Carneiro 2012-04-23 11:27:59 EDT
(In reply to comment #21)
> (In reply to comment #20)
> > Nevermind, it was probably the patch that added "thread-safety".  It only
> > allows the "main thread" to run simulations?  Visualizer runs the simulation in
> > another thread...
> 
> Hello Gustavo,
> 
> ScehduleWithContext is the only schedule method that is thread-safe. The assert
> you mention was added to the rest of the schedule methods to prevent them from
> being invoked from a thread other than the thread that processes the events,
> and thus avoid thread-safety issues.

Yes, you avoid some thread-safety issues, but you also prevent some legal use cases, like this.  When in doubt, you should trust that the programmer knows what he's doing.

> 
> Would it be possible to change the behavior of the Visualizer to make sure that
> scheduling from a different thread than the one running the simulator, is
> always done using ScheduleWithContext?

Not without huge amount of effort.  I do not have time to redesign the visualizer core.
Comment 23 Gustavo J. A. M. Carneiro 2012-04-23 11:28:59 EDT
(In reply to comment #20)
> Nevermind, it was probably the patch that added "thread-safety".  It only
> allows the "main thread" to run simulations?  Visualizer runs the simulation in
> another thread...

I opened bug 1417 for this.