Bug 1079

Summary: MPI code doesn't compile
Product: ns-3 Reporter: Josh Pelkey <jpelkey>
Component: mpiAssignee: George Riley <riley>
Status: RESOLVED FIXED    
Severity: normal CC: gjcarneiro, john.abraham.in, ns-bugs, vedran
Priority: P5    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: patch to fix
mpi results

Description Josh Pelkey 2011-03-25 11:09:07 EDT
[811/1546] cxx: src/mpi/model/mpi-interface.cc -> 
build/debug/src/mpi/model/mpi-interface_1.o 
../src/mpi/model/mpi-interface.cc: In static member function ‘static void 
ns3::MpiInterface::ReceiveMessages()’: 
../src/mpi/model/mpi-interface.cc:245:15: error: ‘pDev’ was not declared in 
this scope 
../src/mpi/model/mpi-interface.cc:250:7: error: ‘pDev’ was not declared in 
this scope 
../src/mpi/model/mpi-interface.cc:254:40: error: ‘Receive’ is not a member 
of ‘ns3::MpiNetDevice’ 
In file included from debug/ns3/attribute.h:25:0, 
                 from debug/ns3/nstime.h:24, 
                 from ../src/mpi/model/mpi-interface.h:28, 
                 from ../src/mpi/model/mpi-interface.cc:26: 
debug/ns3/ptr.h: In destructor ‘ns3::Ptr<T>::~Ptr() [with T = 
ns3::MpiNetDevice]’: 
../src/mpi/model/mpi-interface.cc:239:25:   instantiated from here 
debug/ns3/ptr.h:455:7: error: ‘class ns3::MpiNetDevice’ has no member named 
‘Unref’ 
debug/ns3/ptr.h: In member function ‘void ns3::Ptr<T>::Acquire() const [with 
T = ns3::MpiNetDevice]’: 
debug/ns3/ptr.h:423:3:   instantiated from ‘ns3::Ptr<T>::Ptr(T*) [with T = 
ns3::MpiNetDevice]’ 
debug/ns3/ptr.h:375:55:   instantiated from ‘ns3::Ptr<T> 
ns3::DynamicCast(const ns3::Ptr<T2>&) [with T1 = ns3::MpiNetDevice, T2 = 
ns3::NetDevice]’ 
../src/mpi/model/mpi-interface.cc:245:57:   instantiated from here 
debug/ns3/ptr.h:410:7: error: ‘class ns3::MpiNetDevice’ has no member named 
‘Ref’ 
Waf: Leaving directory `/home/rmovac/ns-3-allinone/ns-3-dev/build' 
Build failed:  -> task failed (err #1): 
        {task: cxx mpi-interface.cc -> mpi-interface_1.o}

================

Variable pDev was changed to pMpiDev. This is an easy enough fix; however, we also need a smart pointer for MpiNetDevice. I tried to make it use SimpleRefCount, but the compiler threw errors about "error: request for member ‘Unref’ is ambiguous" I'm not exactly sure how to proceed.
Comment 1 Gustavo J. A. M. Carneiro 2011-04-07 10:50:10 EDT
Why not make MpiNetDevice a new ns3::Object that is aggregated to the existing ns3::NetDevice subclass?  Inside PointToPointNetDevice you can GetObject<MpiNetDevice> ().  It's the logical thing to do IMHO...
Comment 2 Josh Pelkey 2011-04-11 14:21:47 EDT
(In reply to comment #1)
> Why not make MpiNetDevice a new ns3::Object that is aggregated to the existing
> ns3::NetDevice subclass?  Inside PointToPointNetDevice you can
> GetObject<MpiNetDevice> ().  It's the logical thing to do IMHO...

Hi, John and I discussed this for a while today, and while he probably has a better understanding of the situation, I am still not certain of the best way to proceed. I was wondering why we shouldn't just add a virtual function to the NetDevice called Receive (or MpiReceive). It seems to me that this should be in there anyway (don't all netdevices need to receive?). This way, in mpi-interface, we can just use a generic NetDevice to call Receive (or MpiReceive) and the appropriate net-device object calls its receive function.

BTW, the whole reason we have this issue is because we are trying to avoid a dependency on point-to-point (or any other more specific) net-device within the mpi model, which I couldn't figure out how to eliminate if using the AggregateObject method suggested above.
Comment 3 Gustavo J. A. M. Carneiro 2011-04-11 14:35:03 EDT
(In reply to comment #2)
> (In reply to comment #1)
> > Why not make MpiNetDevice a new ns3::Object that is aggregated to the existing
> > ns3::NetDevice subclass?  Inside PointToPointNetDevice you can
> > GetObject<MpiNetDevice> ().  It's the logical thing to do IMHO...
> 
> Hi, John and I discussed this for a while today, and while he probably has a
> better understanding of the situation, I am still not certain of the best way
> to proceed. I was wondering why we shouldn't just add a virtual function to the
> NetDevice called Receive (or MpiReceive). It seems to me that this should be in
> there anyway (don't all netdevices need to receive?).

I guess the interface between Channel and NetDevice is proprietary for each link layer technology type, therefore it does not need an abstract virtual method.  This gives l2 models more freedom...

> This way, in
> mpi-interface, we can just use a generic NetDevice to call Receive (or
> MpiReceive) and the appropriate net-device object calls its receive function.
> 
> BTW, the whole reason we have this issue is because we are trying to avoid a
> dependency on point-to-point (or any other more specific) net-device within the
> mpi model, which I couldn't figure out how to eliminate if using the
> AggregateObject method suggested above.

What we have now is:

class PointToPointNetDevice : public NetDevice, public MpiNetDevice

AggregateObject is a (saner) alternative to multiple-inheritance.

Sure, you don't have dependency on point-to-point in the mpi model, but you have dependency on mpi in the point-to-point model.  I suggest only to replace multiple-inheritance with "interface aggregation" (which is what AggregateObject means with a non-intuitive name IMHO) functionality of ns3::Object.
Comment 4 John Abraham 2011-04-11 14:52:17 EDT
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Why not make MpiNetDevice a new ns3::Object that is aggregated to the existing
> > > ns3::NetDevice subclass?  Inside PointToPointNetDevice you can
> > > GetObject<MpiNetDevice> ().  It's the logical thing to do IMHO...
> > 
> > Hi, John and I discussed this for a while today, and while he probably has a
> > better understanding of the situation, I am still not certain of the best way
> > to proceed. I was wondering why we shouldn't just add a virtual function to the
> > NetDevice called Receive (or MpiReceive). It seems to me that this should be in
> > there anyway (don't all netdevices need to receive?).
> 
> I guess the interface between Channel and NetDevice is proprietary for each
> link layer technology type, therefore it does not need an abstract virtual
> method.  This gives l2 models more freedom...
> 
> > This way, in
> > mpi-interface, we can just use a generic NetDevice to call Receive (or
> > MpiReceive) and the appropriate net-device object calls its receive function.
> > 
> > BTW, the whole reason we have this issue is because we are trying to avoid a
> > dependency on point-to-point (or any other more specific) net-device within the
> > mpi model, which I couldn't figure out how to eliminate if using the
> > AggregateObject method suggested above.
> 
> What we have now is:
> 
> class PointToPointNetDevice : public NetDevice, public MpiNetDevice
> 
> AggregateObject is a (saner) alternative to multiple-inheritance.
> 
> Sure, you don't have dependency on point-to-point in the mpi model, but you
> have dependency on mpi in the point-to-point model.  I suggest only to replace
> multiple-inheritance with "interface aggregation" (which is what
> AggregateObject means with a non-intuitive name IMHO) functionality of
> ns3::Object.

Will aggregation cause any concerns or hit any limitations, when there are multiple MPI interfaces on the same node? Just double-check
Comment 5 Gustavo J. A. M. Carneiro 2011-04-12 05:44:12 EDT
(In reply to comment #4)
> Will aggregation cause any concerns or hit any limitations, when there are
> multiple MPI interfaces on the same node? Just double-check

If you aggregate an Mpi object to a NetDevice, you can only have one Mpi object per NetDevice.  But if you have multiple NetDevices in a Node, you get multiple Mpi objects on that same Node.
Comment 6 Tom Henderson 2011-04-22 00:40:57 EDT
Created attachment 1091 [details]
patch to fix

This patch implements Gustavo's suggestion; it was tested against third-distributed and simple-distributed.
Comment 7 Josh Pelkey 2011-04-26 15:01:26 EDT
Created attachment 1099 [details]
mpi results

I've run some performance tests using the nms-p2p-nix-distributed example (slightly modified to reduce the total wall clock time). It looks like there is minimal difference on the total simulation time with this patch. I tested ns-3-dev with this patch against the old ns-3.9 branch. I also tested it against ns-3.9 with this patch applied for a more direct test. Results attached. In summary, there is little or no difference with this patch applied. It seems like ns-3-dev performs better in general though, possibly due to the re-modularization efforts.

I pushed this fix to ns-3-dev, changeset 5feb5c87d56f