Bug 468 - Test for possibly unreachable code-- please file a bug report, with a test case, if this is ever hit
Test for possibly unreachable code-- please file a bug report, with a test ca...
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: internet
ns-3.2
All All
: P1 normal
Assigned To: Tom Henderson
:
: 495 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-13 11:04 EST by lally
Modified: 2009-03-03 08:29 EST (History)
3 users (show)

See Also:


Attachments
patch to fix (2.21 KB, patch)
2009-02-19 00:20 EST, Tom Henderson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description lally 2009-01-13 11:04:10 EST
Hi, I've got a simple application protocol that simulates a game engine.  A single server and a parametric number of clients.

When I run it with 18 clients, it runs fine.  32 clients, I get that error message.

Here's a stack trace:
#0  0xfe9f28c8 in ns3::ArpL3Protocol::Lookup (this=0x8086568, packet={m_ptr = 0x80bfd88}, destination={m_address = 167837965}, 
    cache={m_ptr = 0x808fde0}, hardwareDestination=0x80463d0) at ../src/internet-stack/arp-l3-protocol.cc:224
224	              NS_FATAL_ERROR ("Test for possibly unreachable code-- please file a bug report, with a test case, if this is ever hit");
(gdb) bt
#0  0xfe9f28c8 in ns3::ArpL3Protocol::Lookup (this=0x8086568, packet={m_ptr = 0x80bfd88}, destination={m_address = 167837965}, 
    cache={m_ptr = 0x808fde0}, hardwareDestination=0x80463d0) at ../src/internet-stack/arp-l3-protocol.cc:224
#1  0xfe9ed6db in ns3::ArpIpv4Interface::SendTo (this=0x8086398, p={m_ptr = 0x80bfd88}, dest={m_address = 167837965})
    at ../src/internet-stack/arp-ipv4-interface.cc:138
#2  0xfe9b5c10 in ns3::Ipv4Interface::Send (this=0x8086398, p={m_ptr = 0x80bfd88}, dest={m_address = 167837965})
    at ../src/internet-stack/ipv4-interface.cc:169
#3  0xfe9c01f5 in ns3::Ipv4L3Protocol::SendRealOut (this=0x8085710, found=true, route=@0x808f9d0, packet={m_ptr = 0x80bfd88}, 
    ipHeader=@0x8046b30) at ../src/internet-stack/ipv4-l3-protocol.cc:638
#4  0xfe9c9f7a in ns3::MemPtrCallbackImpl<ns3::Ipv4L3Protocol*, void (ns3::Ipv4L3Protocol::*)(bool, ns3::Ipv4Route const&, ns3::Ptr<ns3::Packet>, ns3::Ipv4Header const&), void, bool, ns3::Ipv4Route const&, ns3::Ptr<ns3::Packet>, ns3::Ipv4Header const&, ns3::empty, ns3::empty>::operator() (this=0x80c9e08, a1=true, a2=@0x808f9d0, a3={m_ptr = 0x80bfd88}, a4=@0x8046b30) at callback.h:213
#5  0xfe9c6075 in ns3::Callback<void, bool, ns3::Ipv4Route const&, ns3::Ptr<ns3::Packet>, ns3::Ipv4Header const&, ns3::empty, ns3::empty>::operator() (this=0x80468a0, a1=true, a2=@0x808f9d0, a3={m_ptr = 0x80bfd88}, a4=@0x8046b30) at callback.h:378
#6  0xfe9cf14e in ns3::Ipv4StaticRouting::RequestRoute (this=0x8084078, ifIndex=4294967295, ipHeader=@0x8046b30, packet=
      {m_ptr = 0x80bfd88}, routeReply={<ns3::CallbackBase> = {m_impl = {m_ptr = 0x80c9e08}}, <No data fields>})
    at ../src/internet-stack/ipv4-static-routing.cc:562
#7  0xfe9bb504 in ns3::Ipv4L3Protocol::Lookup (this=0x8085710, ifIndex=4294967295, ipHeader=@0x8046b30, packet=
      {m_ptr = 0x80bfd88}, routeReply={<ns3::CallbackBase> = {m_impl = {m_ptr = 0x80c9e08}}, <No data fields>})
    at ../src/internet-stack/ipv4-l3-protocol.cc:238
#8  0xfe9bb0a7 in ns3::Ipv4L3Protocol::Lookup (this=0x8085710, ipHeader=@0x8046b30, packet={m_ptr = 0x80bfd88}, routeReply=
          {<ns3::CallbackBase> = {m_impl = {m_ptr = 0x80c9e08}}, <No data fields>})
    at ../src/internet-stack/ipv4-l3-protocol.cc:220
#9  0xfe9bf7d1 in ns3::Ipv4L3Protocol::Send (this=0x8085710, packet={m_ptr = 0x80bfd88}, source={m_address = 167837953}, 
    destination={m_address = 167837965}, protocol=17 '\021') at ../src/internet-stack/ipv4-l3-protocol.cc:604
#10 0xfe9d40b9 in ns3::UdpL4Protocol::Send (this=0x80811f8, packet={m_ptr = 0x80bfd88}, saddr={m_address = 167837953}, daddr=
      {m_address = 167837965}, sport=9, dport=49153) at ../src/internet-stack/udp-l4-protocol.cc:205
#11 0xfe9fb8a3 in ns3::UdpSocketImpl::DoSendTo (this=0x80a15b8, p={m_ptr = 0x80ea600}, dest={m_address = 167837965}, port=49153)
    at ../src/internet-stack/udp-socket-impl.cc:354
#12 0xfe9fbdcc in ns3::UdpSocketImpl::SendTo (this=0x80a15b8, p={m_ptr = 0x80ea600}, flags=0, address=@0x8047020)
    at ../src/internet-stack/udp-socket-impl.cc:387
#13 0xfea6ab0f in TorqueServer::SendToAllClients (this=0x80956a8) at ../src/applications/torque/server_app.cc:368
#14 0xfea6d584 in ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (TorqueServer::*)(), TorqueServer*>(void (TorqueServer::*)(), TorqueServer*)::EventMemberImpl0::Notify () at simulator.h:703
#15 0xfe8ee633 in ns3::EventImpl::Invoke (this=0x80b6a08) at ../src/simulator/event-impl.cc:53
#16 0xfe906d85 in ns3::DefaultSimulatorImpl::ProcessOneEvent (this=0x807fb08) at ../src/simulator/default-simulator-impl.cc:125
#17 0xfe906fe3 in ns3::DefaultSimulatorImpl::Run (this=0x807fb08) at ../src/simulator/default-simulator-impl.cc:155
#18 0xfe8ef943 in ns3::Simulator::Run () at ../src/simulator/simulator.cc:145
#19 0x08057a33 in main (args=2, argv=0x8047550) at ../scratch/torquesim.cc:98
(gdb) quit

It's 1535 lines of total code in my simulator.  Holla if you want it.
Comment 1 lally 2009-01-13 11:04:43 EST
Also:
[lally@sol ns-3.2.1]$ uname -a
SunOS sol 5.11 snv_101b i86pc i386 i86pc

Comment 2 Tom Henderson 2009-01-14 00:43:22 EST
Your bug report is against ns-3.2.  Since then, there have been a couple of patches to fix the ArpCache behavior in this regard; can you give your code a try with ns-3-dev (at least, you might be able to drop in the arp-cache.{cc,h} and arp-l4-protocol.{cc,h} files that are in the current ns-3-dev without otherwise touching your code)?

Or, try applying these patches to see if it clears up?
http://www.nsnam.org/bugzilla/show_bug.cgi?id=362
http://www.nsnam.org/bugzilla/show_bug.cgi?id=451
Comment 3 lally 2009-01-14 20:18:19 EST
(In reply to comment #2)
> Your bug report is against ns-3.2.  Since then, there have been a couple of
> patches to fix the ArpCache behavior in this regard; can you give your code a
> try with ns-3-dev (at least, you might be able to drop in the arp-cache.{cc,h}
> and arp-l4-protocol.{cc,h} files that are in the current ns-3-dev without
> otherwise touching your code)?
> 
> Or, try applying these patches to see if it clears up?
> http://www.nsnam.org/bugzilla/show_bug.cgi?id=362
> http://www.nsnam.org/bugzilla/show_bug.cgi?id=451
> 

Copying the files seems to have worked.  I tried first to build ns-3-dev, but that didn't work out.  Something about not liking '-c -o'.
Comment 4 Tom Henderson 2009-01-15 00:51:09 EST

*** This bug has been marked as a duplicate of bug 451 ***
Comment 5 Tom Henderson 2009-01-15 00:52:17 EST
(In reply to comment #3)
> (In reply to comment #2)
> > Your bug report is against ns-3.2.  Since then, there have been a couple of
> > patches to fix the ArpCache behavior in this regard; can you give your code a
> > try with ns-3-dev (at least, you might be able to drop in the arp-cache.{cc,h}
> > and arp-l4-protocol.{cc,h} files that are in the current ns-3-dev without
> > otherwise touching your code)?
> > 
> > Or, try applying these patches to see if it clears up?
> > http://www.nsnam.org/bugzilla/show_bug.cgi?id=362
> > http://www.nsnam.org/bugzilla/show_bug.cgi?id=451
> > 
> 
> Copying the files seems to have worked.  I tried first to build ns-3-dev, but
> that didn't work out.  Something about not liking '-c -o'.
> 

Please file a separate bug if you think there is a problem with building ns-3-dev.
Comment 6 lally 2009-01-15 20:54:38 EST
Nevermind.  The bug reappears, only requiring more simulated clients to do so.  

Now it works at 36 clients, and fails at 37.  Before copying in the files suggested, the numbers were 23 and 24, respectively.

Should I post this under 451's entry?
Comment 7 Tom Henderson 2009-01-16 01:22:31 EST
(In reply to comment #6)
> Nevermind.  The bug reappears, only requiring more simulated clients to do so.  
> 
> Now it works at 36 clients, and fails at 37.  Before copying in the files
> suggested, the numbers were 23 and 24, respectively.
> 
> Should I post this under 451's entry?
> 
No, this is fine, it may be a separate problem.

The below is half-baked, but can you see whether it fixes your problem?

diff -r 8e5d0354c509 src/internet-stack/arp-cache.cc
--- a/src/internet-stack/arp-cache.cc   Wed Jan 14 15:36:19 2009 +0000
+++ b/src/internet-stack/arp-cache.cc   Thu Jan 15 22:34:02 2009 -0800
@@ -213,7 +213,12 @@ ArpCache::HandleWaitReplyTimeout (void)
                 }
             }
        }
-
+      else
+        {
+          // This will cause the entry expiration time to line up
+          // with the WaitReply timer for future timeouts
+          entry->UpdateSeen ();
+        }
     }
   if (restartWaitReplyTimer)
     {
diff -r 8e5d0354c509 src/internet-stack/arp-cache.h
--- a/src/internet-stack/arp-cache.h    Wed Jan 14 15:36:19 2009 +0000
+++ b/src/internet-stack/arp-cache.h    Thu Jan 15 22:34:02 2009 -0800
@@ -199,6 +199,7 @@ public:
      */
     void ClearRetries (void);
 
+    void UpdateSeen (void);
   private:
     enum ArpCacheEntryState_e {
       ALIVE,
@@ -206,7 +207,6 @@ public:
       DEAD
     };
 
-    void UpdateSeen (void);
     Time GetTimeout (void) const;
     ArpCache *m_arp;
     ArpCacheEntryState_e m_state;

Comment 8 lally 2009-01-19 20:00:21 EST
(In reply to comment #7)
> (In reply to comment #6)
> > Nevermind.  The bug reappears, only requiring more simulated clients to do so.  
> > 
> > Now it works at 36 clients, and fails at 37.  Before copying in the files
> > suggested, the numbers were 23 and 24, respectively.
> > 
> > Should I post this under 451's entry?
> > 
> No, this is fine, it may be a separate problem.
> 
> The below is half-baked, but can you see whether it fixes your problem?
> 
> diff -r 8e5d0354c509 src/internet-stack/arp-cache.cc
> --- a/src/internet-stack/arp-cache.cc   Wed Jan 14 15:36:19 2009 +0000
> +++ b/src/internet-stack/arp-cache.cc   Thu Jan 15 22:34:02 2009 -0800
> @@ -213,7 +213,12 @@ ArpCache::HandleWaitReplyTimeout (void)
>                  }
>              }
>         }
> -
> +      else
> +        {
> +          // This will cause the entry expiration time to line up
> +          // with the WaitReply timer for future timeouts
> +          entry->UpdateSeen ();
> +        }
>      }
>    if (restartWaitReplyTimer)
>      {
> diff -r 8e5d0354c509 src/internet-stack/arp-cache.h
> --- a/src/internet-stack/arp-cache.h    Wed Jan 14 15:36:19 2009 +0000
> +++ b/src/internet-stack/arp-cache.h    Thu Jan 15 22:34:02 2009 -0800
> @@ -199,6 +199,7 @@ public:
>       */
>      void ClearRetries (void);
> 
> +    void UpdateSeen (void);
>    private:
>      enum ArpCacheEntryState_e {
>        ALIVE,
> @@ -206,7 +207,6 @@ public:
>        DEAD
>      };
> 
> -    void UpdateSeen (void);
>      Time GetTimeout (void) const;
>      ArpCache *m_arp;
>      ArpCacheEntryState_e m_state;
> 

Thanks.  I tried it.  38 works. 39 doesn't.

I'm on opensolaris, I can get you dtrace logs of what's going on, if there's something you want me to check out.  It's no problem, I'm happy to help.
Comment 9 lally 2009-01-19 20:22:33 EST
Also, at larger numbers, say 100-ish, it dies out with this:
No Packet on queue during CsmaNetDevice::TransmitAbort()

Donno if that helps.
Comment 10 Tom Henderson 2009-02-12 11:12:59 EST
*** Bug 495 has been marked as a duplicate of this bug. ***
Comment 11 Tom Henderson 2009-02-19 00:20:34 EST
Created attachment 381 [details]
patch to fix

This is actually a carry-over of bug 451, which seems to have been the wrong way to fix this problem.

A single WaitReply timer is maintained in an ArpCache, and it fires every WaitReplyTimeout seconds.  The attribute documentation says it should behave as follows:
    .AddAttribute ("WaitReplyTimeout",                   "When this timeout expires, the cache entries will be scanned and entries in WaitReply state will resend ArpRequest unless MaxRetries has been exceeded, in which case the entry is marked dead",

However, the current code with the IsExpiring() check will suppress the first retransmission in cases in which there are multiple entries in WaitReply state in the cache.

If the IsExpiring() check is deleted, then the behavior will align with the attribute documentation.  What this means is that if there is already an entry in WaitReply state in the cache for another destination, the first retransmission of a new arp request (for a new destination) will occur sometime between 0 and WaitReplyTimeout seconds, depending on when the original request occurred.  Subsequent retransmissions will occur on the WaitReplyTimeout intervals.  

Regression traces do not change for this patch.
Comment 12 Tom Henderson 2009-02-23 10:08:48 EST
I would like to push this tomorrow if there are no further comments.
Comment 13 Tom Henderson 2009-03-03 08:29:24 EST
changeset 724c9442a9f0