Bug 2595 - Avoid sorting Ptrs
Avoid sorting Ptrs
Status: PATCH PENDING
Product: ns-3
Classification: Unclassified
Component: spectrum
ns-3-dev
All All
: P3 major
Assigned To: Nicola Baldo
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-22 19:16 EST by Alexander Krotov
Modified: 2017-01-04 00:17 EST (History)
2 users (show)

See Also:


Attachments
Proposed patch (4.69 KB, patch)
2016-12-22 19:16 EST, Alexander Krotov
Details | Diff
Patch to delete comparison operators for Ptr (1.66 KB, patch)
2016-12-22 19:29 EST, Alexander Krotov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Krotov 2016-12-22 19:16:03 EST
Created attachment 2712 [details]
Proposed patch

Recently I have found a very strange bug in one of my experiments. Switching between glibc malloc and tcmalloc changed results and valgrind showed no problems.

The reason was std::set<Ptr<SpectrumPhy>> in MultiModelSpectrumChannel. When StartTx iterates over the set, the order depends on pointer values. There order in which receivers are processed does not matter. But if all receivers generate some kind of responses, the order does matter, because original sender considers first response as the signal and all other responses as interference.

One of the goals of NS-3 is reproducible results so I think nothing should depend on pointer values. Here I attach the patch that replaces std::set with std::vector in MultiModelSpectrumChannel and actually fixes the bug for me.
Comment 1 Alexander Krotov 2016-12-22 19:29:05 EST
Created attachment 2713 [details]
Patch to delete comparison operators for Ptr

Actually the problem is more general than just this case.

NS-3 should give reproducible results even in case of simultaneous events. It is the reason EventId has m_uid and SpectrumModelUid_t is compared instead of Ptr<const SpectrumModel>. These IDs depend on the order of object creation but not on the memory allocator so they provide reliable tie breaking rules.

I think every Ptr comparison in NS-3 indicates a potential problem. So I propose a patch to delete Ptr comparison operators. This patch results in broken build as Ptr is used as std::map and std::set key in many cases. So applying this patch can show all these cases. Should we eliminate all of them?

Another potential problem I can think of is unstable std::sort used instead of std::stable_sort. Probably we should adopt the rule to use std::stable_sort everywhere.
Comment 2 Nicola Baldo 2016-12-22 19:48:25 EST
Hi Alexander,

thanks for investigating this issue. Please find some preliminary comments inline below

(In reply to Alexander Krotov from comment #0)
> The reason was std::set<Ptr<SpectrumPhy>> in MultiModelSpectrumChannel. When
> StartTx iterates over the set, the order depends on pointer values. There
> order in which receivers are processed does not matter. But if all receivers
> generate some kind of responses, the order does matter, because original
> sender considers first response as the signal and all other responses as
> interference.
> 
> One of the goals of NS-3 is reproducible results so I think nothing should
> depend on pointer values. 

If the issue happens when two event are scheduled with the same time, wouldn't you get the same artifact by switching to a different scheduler implementation? 
I think the reproducibility issue here lies in the PHY model. A good PHY model should behave the same independently of the order of execution of two simultaneous simulation events.


> Here I attach the patch that replaces std::set
> with std::vector in MultiModelSpectrumChannel and actually fixes the bug for
> me.

I am a bit concerned about performances issues due to the slower lookup of entries in std::vector::find compared with std::set::find. Some profiling in large scenarios would be needed to quantify this better.

(In reply to Alexander Krotov from comment #1)
> NS-3 should give reproducible results even in case of simultaneous events.
> It is the reason EventId has m_uid and SpectrumModelUid_t is compared
> instead of Ptr<const SpectrumModel>. These IDs depend on the order of object
> creation but not on the memory allocator so they provide reliable tie
> breaking rules.
> 
> I think every Ptr comparison in NS-3 indicates a potential problem. So I
> propose a patch to delete Ptr comparison operators. This patch results in
> broken build as Ptr is used as std::map and std::set key in many cases. So
> applying this patch can show all these cases. Should we eliminate all of
> them?

This approach looks a bit overkill to me... however I'd be curious to know how big the resulting change would be :-P
Comment 3 Alexander Krotov 2016-12-22 20:50:14 EST
> If the issue happens when two event are scheduled with the same time, wouldn't you get the same artifact by switching to a different scheduler implementation? 

No, that is what m_uid in EventId is for. I mentioned it in the second message. It is an established approach to turn "unstable" schedulers into "stable" ones: "Nonstable priority queues can be made stable by the introduction of an auxiliary priority field, a sequence number [McCormack and Sargent 1981], to break ties." (see https://dl.acm.org/citation.cfm?id=249205 and pdf link http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4.3753&rep=rep1&type=pdf ) McCormack and Sargent paper is dated April 1980 actually: http://eprints.cs.vt.edu/archive/00000845/01/CS80002-R.pdf

Both NS-2 and NS-3 use this approach to make runs deterministic even across schedulers, so I guess the original intention was to make runs reproducible even in presence of simultaneous events and models that depends on their order.

> I think the reproducibility issue here lies in the PHY model. A good PHY model should behave the same independently of the order of execution of two simultaneous simulation events.

Lets say you have a wireless protocol where you send a message and all receivers who hear you reply with an acknowledgement. To resolve collisions receivers use CDMA or something. If you don't model low-level details and use probabilistic model, the probability of resolving a collision would be drawn from the same random stream. And the order in which acknowledgements are received (even though they are simultaneously received) decides which position in RNG sequence corresponds to which receiver. So the results will be different depending on the order of simultaneously received acknowledgements.

Is it really a duty of the PHY model to break ties in this case? Should it postpone processing of all received signals until it receives the last one (how can it determine which is the last one?), sort them according to some ID such as MAC address (what if it is not available?) and draw probabilities only after that? I think it is much easier to make the order in which channel processes receivers deterministic than trying to restore order on the PHY.

> I am a bit concerned about performances issues due to the slower lookup of entries in std::vector::find compared with std::set::find. Some profiling in large scenarios would be needed to quantify this better.

AddRx is called only once per receiver in the beginning of the simulation when you attach receiver to the wireless channel. StartTx, which is called much more often, just iterates over the container and schedules signal reception to each receiver. Performance should improve as iterating over std::vector is generally faster than iterating over std::set (which is usually implemented as an RB-tree inside).

> This approach looks a bit overkill to me... however I'd be curious to know how big the resulting change would be :-P

Ok, I may try to implement it, I don't think the change will be big. Actually, I think it is not an overkill if you want the simulator to be deterministic. If NS-3 strives for completely reproducible runs, Ptr comparisons and unstable sorts are bugs.