Bug 154 - Attach sockets to nodes
: Attach sockets to nodes
Status: RESOLVED FIXED
: ns-3
node module
: pre-release
: All All
: P3 normal
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-03-27 12:18 EDT by
Modified: 2008-07-01 13:32 EDT (History)


Attachments
This is Gustavo's test case worked out in full. (2.64 KB, patch)
2008-03-27 20:48 EDT, Rajib Bhattacharjea
Details | Diff
Proposed solution (628 bytes, patch)
2008-03-27 20:58 EDT, Rajib Bhattacharjea
Details | Diff
raj's patch updated to ns-3-dev (673 bytes, text/x-diff)
2008-05-28 12:04 EDT, Gustavo J. A. M. Carneiro
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-03-27 12:18:19 EDT
I just spent about an hour debugging a problem in a student's NS-3 program. 
And the student spent probably a day :-)  But I think this problem can be
easily prevented...

The student's code had roughly this structure:

using namespace ns3;
class Test
{
public:

  void ReceivePacket (Ptr<Socket> socket, Ptr<Packet> packet, const Address
&from);
  bool run(void);
};

void Test::ReceivePacket (Ptr<Socket> socket, Ptr<Packet> packet, const Address
&from)
{
  NS_LOG_INFO("receive packet");
  [...]
}



bool
Test::run(void)
{
[...] creates nodes [...]
    NS_LOG_INFO ("create socket  receive");
    Ptr<SocketFactory> socketFactory= rxNode->GetObject<Udp> ();
    Ptr<Socket> rxSocket= socketFactory->CreateSocket();
    rxSocket=->Bind (InetSocketAddress (Ipv4Address ("10.1.4.2"), 1234));
    rxSocket=->SetRecvCallback (MakeCallback (&Test::ReceivePacket, this));

    NS_LOG_INFO("send socket ");
    Ptr<SocketFactory> n0SocketFactory = n0->GetObject<Udp> ();
    Ptr<Socket> n0socket = n0SocketFactory->CreateSocket ();
    Ptr<Packet> packet=Create <Packet>();
    NS_LOG_INFO("....");
    n0socket->SendTo (InetSocketAddress (Ipv4Address("10.1.4.2"),
1234),packet); 

    return true;
}

int 
main (int argc, char *argv[])
{
  Test teste;
  teste.run();
[...]
  NS_LOG_INFO ("Run Simulation.");
  Simulator::StopAt (Seconds (3.0));
  Simulator::Run ();    
  Simulator::Destroy ();
  NS_LOG_INFO ("Done.");
}

This code does not work because "Ptr<Socket> rxSocket" inside Test.run() goes
out of scope before the simulation even begins, and so the socket is destroyed.

I would propose that nodes would keep a list of pointers to sockets.  They
would release those references when the socket is Close()ed, or when the node
is disposed.  This is more or less what is done with Applications already...
------- Comment #1 From 2008-03-27 12:26:55 EDT -------
The expectation is that sockets are held by Applications which are held by
Nodes. So, if you want to avoid these problems, the best way to do so is by
providing new subclasses of the Application base class which hold your sockets.
------- Comment #2 From 2008-03-27 12:37:33 EDT -------
Holding sockets in some class, be it Application or otherwise, is more work
than needed.  NS-3 can save the programmer work by doing this work by itself,
with little cost.

Actually the cost here is _discovering_ the problem.  It wasn't at all obvious,
even for me, and only through debugging with NS_LOG was I able to track it
down.  The solution to problem is easy; diagnosing the problem was hard.

Eliminating nasty surprises in the simulator is part of our job.  So I strongly
recommend either make  it work (preferably) or detect the problem automatically
and give an error message.  I only know how to make it work, not detect.
------- Comment #3 From 2008-03-27 12:55:37 EDT -------
(In reply to comment #2)
> Holding sockets in some class, be it Application or otherwise, is more work
> than needed.  NS-3 can save the programmer work by doing this work by itself,

I understand you feel pretty strongly that this is a user bug which can be
avoided but,

> with little cost.

The memory cost is not "little" so, I am very uneasy about this proposal,
especially since there is another fix, as outlined in my previous comment. I am
also pretty uneasy about this proposed change because I really think that it is
a bug to create a socket, release it, and expect it to stay around while you
run the simulation. 

Furthermore, I have a very hard time to see how your proposal would gracefully
deal with appearing and disappearing sockets: how can you (user) make sure that
a socket is really released ? Do you have to go into the array attached to the
node and remove it from there ? _that_ sounds really bad.
------- Comment #4 From 2008-03-27 13:13:23 EDT -------
(In reply to comment #3)
> (In reply to comment #2)
> > Holding sockets in some class, be it Application or otherwise, is more work
> > than needed.  NS-3 can save the programmer work by doing this work by itself,
> 
> I understand you feel pretty strongly that this is a user bug which can be
> avoided but,
> 
> > with little cost.
> 
> The memory cost is not "little" so, I am very uneasy about this proposal,
> especially since there is another fix, as outlined in my previous comment. I am
> also pretty uneasy about this proposed change because I really think that it is
> a bug to create a socket, release it, and expect it to stay around while you
> run the simulation. 

The memory cost is one pointer per socket.  Surely that is negligible.  Same
problem as keeping a list of applications in the Node.

> 
> Furthermore, I have a very hard time to see how your proposal would gracefully
> deal with appearing and disappearing sockets: how can you (user) make sure that
> a socket is really released ? Do you have to go into the array attached to the
> node and remove it from there ? _that_ sounds really bad.

Like I said, I would envision socket->Close() to take care of it.  Or even if a
socket is closed from the remote side, the Node could then release its
reference to it (since a closed socket cannot be used for communication
anymore).
------- Comment #5 From 2008-03-27 14:38:26 EDT -------
I agree that it is not really intuitive the way the code now works, as
exemplified by this test case.  I as a user would expect that if I used a
socket and passed data to it, I could delete my local reference to the socket
afterwards, just as I could lose reference to the descriptor in real code
without destroying the underlying packet.

I would suggest trying to fix this somehow such that the provided test case
works; not by suggesting that users subclass from Application, because
otherwise it is bound to reoccur since I have now seen three ns-3 users write
code like this that avoids the use of base class Application.
------- Comment #6 From 2008-03-27 15:22:24 EDT -------
The bottom line is that SOMEONE has to retain a reference to the socket
throughout its lifetime.  So far we have two alternatives, one being that the
node retains references to its sockets (Gustavo's suggestion), the other being
that the user of sockets (Applications or end-users) keeps a reference (current
expectation).

An approach we took in Tcp for a similar problem was simply to count the
references to the socket that live in the Ipv4Endpoint demux:
http://code.nsnam.org/ns-3-dev/rev/4956586bd798

In UdpSocket::FinishBind, Note that "this" effectively copied into into the
endpoint by passing the callback.  It is never Ref'd however.  Our fix was to
pass a smart pointer into the callback, Ref'ing it in the process.

This way, when the endpoint is deallocated, the smart pointer goes out of scope
and Unrefs the socket.  The assures that the lifetime of the socket is tied to
the lifetime of its entry in the demux, which seems more correct to me.

So a change similar to what we did in TcpSocket should do the trick.
------- Comment #7 From 2008-03-27 16:46:20 EDT -------
(In reply to comment #6)
> The bottom line is that SOMEONE has to retain a reference to the socket
> throughout its lifetime.  So far we have two alternatives, one being that the
> node retains references to its sockets (Gustavo's suggestion), the other being
> that the user of sockets (Applications or end-users) keeps a reference (current
> expectation).
> 
> An approach we took in Tcp for a similar problem was simply to count the
> references to the socket that live in the Ipv4Endpoint demux:
> http://code.nsnam.org/ns-3-dev/rev/4956586bd798
> 
> In UdpSocket::FinishBind, Note that "this" effectively copied into into the
> endpoint by passing the callback.  It is never Ref'd however.  Our fix was to
> pass a smart pointer into the callback, Ref'ing it in the process.
> 
> This way, when the endpoint is deallocated, the smart pointer goes out of scope
> and Unrefs the socket.  The assures that the lifetime of the socket is tied to
> the lifetime of its entry in the demux, which seems more correct to me.

What happens if a user releases the reference to the socket and he never calls
Close ?

> 
> So a change similar to what we did in TcpSocket should do the trick.
> 

(In reply to comment #6)
> The bottom line is that SOMEONE has to retain a reference to the socket
> throughout its lifetime.  So far we have two alternatives, one being that the
> node retains references to its sockets (Gustavo's suggestion), the other being
> that the user of sockets (Applications or end-users) keeps a reference (current
> expectation).
> 
> An approach we took in Tcp for a similar problem was simply to count the
> references to the socket that live in the Ipv4Endpoint demux:
> http://code.nsnam.org/ns-3-dev/rev/4956586bd798
> 
> In UdpSocket::FinishBind, Note that "this" effectively copied into into the
> endpoint by passing the callback.  It is never Ref'd however.  Our fix was to
> pass a smart pointer into the callback, Ref'ing it in the process.
> 
> This way, when the endpoint is deallocated, the smart pointer goes out of scope
> and Unrefs the socket.  The assures that the lifetime of the socket is tied to
> the lifetime of its entry in the demux, which seems more correct to me.
> 
> So a change similar to what we did in TcpSocket should do the trick.

I was thinking along similar lines but, for UDP, the lifetime of the socket is
much more problematic: i.e., it is not bound to an underlying stream which can
be closed or not by the remote side. So, if the user forgets to call Close, the
socket will live forever, even though it has data to send and cannot forward
its incoming data to anyone.

I might be worried about a non-issue though.
------- Comment #8 From 2008-03-27 16:52:16 EDT -------
(In reply to comment #5)
> I agree that it is not really intuitive the way the code now works, as
> exemplified by this test case.  I as a user would expect that if I used a
> socket and passed data to it, I could delete my local reference to the socket
> afterwards, just as I could lose reference to the descriptor in real code
> without destroying the underlying packet.

Sure, however, what is happening in the real world is that it is your
application which keeps track of open file descriptors and the kernel frees
them when the application dies. The problem here, is that you have associate
your socket to a user context to be able to destroy it when the user context
disapears and the user context in ns3 is the Application class.
------- Comment #9 From 2008-03-27 19:36:56 EDT -------
(In reply to comment #7)
[snip]
> What happens if a user releases the reference to the socket and he never calls
> Close ?
> 
> I was thinking along similar lines but, for UDP, the lifetime of the socket is
> much more problematic: i.e., it is not bound to an underlying stream which can
> be closed or not by the remote side. So, if the user forgets to call Close, the
> socket will live forever, even though it has data to send and cannot forward
> its incoming data to anyone.
> 
> I might be worried about a non-issue though.

I think that the situation you are describing would become an error on the
user's part under my proposal.  In the real world, if you don't call close on a
socket, doesn't it persist indefinitely?  From this perspective, this is the
correct behavior.

The Socket WILL continue to deliver data up to the application however via the
callback mechanism.  The Socket persists and has a pointer to the application
upcall.

And luckily for us, Simulator::Destroy will eventually lead to
UdpL4Protocol::DoDispose, which will delete it's demux, the destructor of which
will delete each endpoint, the destructor of which will release the last
reference to the Socket (the ones that live in the endpoint destroy and rx
callbacks).  So this is shouldn't leak memory from a valgrind perspective
either.
------- Comment #10 From 2008-03-27 19:49:57 EDT -------
(In reply to comment #9)
> (In reply to comment #7)
> [snip]
> > What happens if a user releases the reference to the socket and he never calls
> > Close ?
> > 
> > I was thinking along similar lines but, for UDP, the lifetime of the socket is
> > much more problematic: i.e., it is not bound to an underlying stream which can
> > be closed or not by the remote side. So, if the user forgets to call Close, the
> > socket will live forever, even though it has data to send and cannot forward
> > its incoming data to anyone.
> > 
> > I might be worried about a non-issue though.
> 
> I think that the situation you are describing would become an error on the
> user's part under my proposal.

Hey, your proposal is the same as mine :-)

I just did not understand the implementation very well...

>  In the real world, if you don't call close on a
> socket, doesn't it persist indefinitely?  From this perspective, this is the
> correct behavior.

Agreed, a socket should live until told to die.  A smart pointer going out of
scope should not be enough to make a socket die.

> 
> The Socket WILL continue to deliver data up to the application however via the
> callback mechanism.  The Socket persists and has a pointer to the application
> upcall.
> 
> And luckily for us, Simulator::Destroy will eventually lead to
> UdpL4Protocol::DoDispose, which will delete it's demux, the destructor of which
> will delete each endpoint, the destructor of which will release the last
> reference to the Socket (the ones that live in the endpoint destroy and rx
> callbacks).  So this is shouldn't leak memory from a valgrind perspective
> either.

Indeed.

IMHO a socket, once created, should remain under custodian of its factory.  The
factory should remove its reference to the socket when either:
 1. the factory dies (and the factory dies when the node is disposed, at the
end of the simulation)
 2. it is explicitly closed, by calling Socket::Close ()
 3. the remote end point closes the connection, for connection oriented
sockets.

I don't think this costs much in terms of memory (at most 16 bytes per socket;
that's less than 1.6 MiB for 10^5 sockets).  And like Raj said, having to
close() sockets is already what happens in the real world.
------- Comment #11 From 2008-03-27 20:48:20 EDT -------
Created an attachment (id=119) [details]
This is gustavo
------- Comment #12 From 2008-03-27 20:49:49 EDT -------
(In reply to comment #11)
> Created an attachment (id=119) [edit] [details]
> This is gustavo
> 

Should read "This is Gustavo's test case worked out in full"
------- Comment #13 From 2008-03-27 20:58:21 EDT -------
Created an attachment (id=120) [details]
Proposed solution

To clarify to Gustavo:

The implementation is as follows:  Instead of having the creator of the
UdpSocket (what you call the factory) maintain a reference to it and consuming
an extra amount of memory, this approach effectively Ref's the socket one more
time to account for the pointer that exists to the socket in the endpoint
callbacks (Ipv4EndPoint::m_rxCallback and m_destroyCallback).  From a reference
counting perspective, I contend this is the right thing to do anyway.

This fix, while somewhat orthogonal to the issue at hand, also happens to solve
the problem.
------- Comment #14 From 2008-03-28 00:12:14 EDT -------
> IMHO a socket, once created, should remain under custodian of its factory.  The
> factory should remove its reference to the socket when either:
>  1. the factory dies (and the factory dies when the node is disposed, at the
> end of the simulation)
>  2. it is explicitly closed, by calling Socket::Close ()
>  3. the remote end point closes the connection, for connection oriented
> sockets.
> 
> I don't think this costs much in terms of memory (at most 16 bytes per socket;
> that's less than 1.6 MiB for 10^5 sockets).  And like Raj said, having to
> close() sockets is already what happens in the real world.
> 

Yes, but I think it is not completely clear cut.  We are debating the issue of
what happens if the user forgets to close the socket.  In your proposal, we
could leak sockets (descriptors) in such a case.  As presently implemented, we
would delete the socket instead of closing it.  Neither really captures what
happens in real code, in which close() is called when the process goes away
before the user closed it.

Your and Raj's proposal would err on the side of keeping sockets around, and I
would be fine with that, because I think it would be less intuitive to a user
to have sockets disappear without even an implicit close() ever having
occurred.  This is probably more relevant to consider with TCP but might be a
problem with UDP if we ever model the case where the UDP socket didn't
immediately send the datagram for some reason. 

there is the separate issue of whether ns-3 should do something when methods
like Test::Run() are called before Simulation::Run() is called.  Clearly, it
will provide undefined behavior to use parts of the API outside of a running
simulation context.  Do we want to try to guard against this misuse somehow?
------- Comment #15 From 2008-03-28 06:15:00 EDT -------
(In reply to comment #13)
> Created an attachment (id=120) [edit] [details]
> Proposed solution
> 
> To clarify to Gustavo:
> 
> The implementation is as follows:  Instead of having the creator of the
> UdpSocket (what you call the factory) maintain a reference to it and consuming
> an extra amount of memory, this approach effectively Ref's the socket one more
> time to account for the pointer that exists to the socket in the endpoint
> callbacks (Ipv4EndPoint::m_rxCallback and m_destroyCallback).  From a reference
> counting perspective, I contend this is the right thing to do anyway.
> 
> This fix, while somewhat orthogonal to the issue at hand, also happens to solve
> the problem.

That sounds like a good solution as well, although it has slightly different
semantics from what I proposed.  Under your solution, a reference is kept to
the socket as long as there is a (rx,tx,destroy) callback associated with the
socket.  That makes sense.  A socket with no callbacks and which goes out of
scope should just be freed, as it will not prevent anything from working.
------- Comment #16 From 2008-03-28 06:34:46 EDT -------
(In reply to comment #14)
> Yes, but I think it is not completely clear cut.  We are debating the issue of
> what happens if the user forgets to close the socket.  In your proposal, we
> could leak sockets (descriptors) in such a case.  As presently implemented, we
> would delete the socket instead of closing it.  Neither really captures what
> happens in real code, in which close() is called when the process goes away
> before the user closed it.

If you wanted to capture what _really_ happens in the real world, then sockets
would always have to belong to applications, and be conceptually attached to
applications:

  Ptr<Socket> SocketFactory::CreateSocket (Ptr<Application> application);

Since in the real world sockets are created by _processes_ making a system call
into the kernel.  The kernel would implicitly receive the process ID as
"parameter" and would associate the newly created socket with the process.

In NS-3, because users do not want to have to write new application classes
(ugh!), we would need a 'generic' application.

Ptr<GenericApplication> Node::CreateApplication ();

Applications would then be release their own sockets when they are destroyed.

I think the above model would mimick best what happens in real world.  However,
if applications are only destroyed when a node is destroyed, or when the
programmer explicitly destroys them, their sockets will consequently also only
be destroyed when the application is explicitly destroyed or when the node is
destroyed.  And so this ultimately reallistic API turns is reduced to my
proposed solution where sockets are destroyed explicitly or when the Node is
destroyed.

I am in favour of making the Application "middle man" optional (personally I
tend to never use it), and just allow sockets to be used directly.

> 
> Your and Raj's proposal would err on the side of keeping sockets around, and I
> would be fine with that, because I think it would be less intuitive to a user
> to have sockets disappear without even an implicit close() ever having
> occurred.  This is probably more relevant to consider with TCP but might be a
> problem with UDP if we ever model the case where the UDP socket didn't
> immediately send the datagram for some reason. 

What if I create a OnOff application and forget to dispose it?  It will keep on
sending traffic forever...  same problem.

> 
> there is the separate issue of whether ns-3 should do something when methods
> like Test::Run() are called before Simulation::Run() is called.  Clearly, it
> will provide undefined behavior to use parts of the API outside of a running
> simulation context.  Do we want to try to guard against this misuse somehow?
> 

The logical solution would be:
1. Most APIs, like socket creation, fail an assertion when called and the
simulator not running;
2. It would force programmers to do the simulation initialization stuff in a
callback, scheduled with Simulator::ScheduleNow;

But I don't think that would help.  The Simulation initialization function
would still create a local variable smart pointer, create the socket, and it
would still go out of scope and destroy the socket before time.
------- Comment #17 From 2008-04-24 10:46:51 EDT -------
I just wanted to say that today I found out the same socket error by the same
student again, even after I explained the problem a few weeks ago.

But I give up trying to convince anyone anymore :-(
------- Comment #18 From 2008-04-24 16:11:30 EDT -------
Raj, after today's conference call, you have the green light to commit your
patch.
------- Comment #19 From 2008-05-28 07:24:26 EDT -------
Is there anything blocking the patch?
------- Comment #20 From 2008-05-28 11:14:37 EDT -------
(In reply to comment #19)
> Is there anything blocking the patch?

I will happily apply it if someone can confirm that we have no
valgrind-reported memory leak regressions
------- Comment #21 From 2008-05-28 12:04:01 EDT -------
Created an attachment (id=138) [details]
raj's patch updated to ns-3-dev

Regarding valgrind, waf check --valgrind does not report any problem, with or
without the patch.  waf regression --valgrind reports errors related to packet
tags, with or without the patch.
------- Comment #22 From 2008-05-28 12:13:59 EDT -------
(In reply to comment #21)

> Regarding valgrind, waf check --valgrind does not report any problem, with or
> without the patch.  waf regression --valgrind reports errors related to packet
> tags, with or without the patch.

Please, file a separate bug about the latter problem and merge this patch.
------- Comment #23 From 2008-05-28 12:53:31 EDT -------
Filed bug #192 about the regressions.
Patch pushed.