Bug 2308 – ByteTag in LTE instead of PacketTag and metadata limit

Bug 2308 - ByteTag in LTE instead of PacketTag and metadata limit


Summary:	ByteTag in LTE instead of PacketTag and metadata limit

Status:	REOPENED

Product:	ns-3
Classification:	Unclassified
Component:	lte
Version:	pre-release
Hardware:	PC Linux

Importance:	P5 major
Assigned To:	Manuel Requena

URL:
Keywords:

Depends on:	231
Blocks:
	Show dependency tree / graph

Reported:	2016-02-26 06:02 EST by natale.patriciello
Modified:	2018-10-31 05:21 EDT (History)
CC List:	10 users (show)

See Also:	2069 2361 2221

Attachments
Use PacketTag instead of ByteTag (4.72 KB, patch) 2016-02-26 06:02 EST, natale.patriciello	Details \| Diff
TestCase (15.46 KB, patch) 2016-02-26 06:04 EST, natale.patriciello	Details \| Diff
patch to replace ByteTag usage with PacketTag (revising Natale's original) (3.98 KB, patch) 2016-05-12 14:55 EDT, Tom Henderson	Details \| Diff
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description natale.patriciello 2016-02-26 06:02:23 EST

TL;DR
LTE adds ByteTag, which follows bytes and are not removed; when recombining segments in the TcpRxBuffer, the limit on the amount of metadata operations is reached.

Possible fix: Use PacketTag instead of ByteTag.


Long explanation. The situation is the following:

there is a single TCP flow, running from an UE (connected to ENB through LTE) to a remote host (the path between ENB and remote is made by p2p links). Losses, and other factors, are forcing the TcpTxBuffer to store ~3 MB of data (we intentionally lose one segment, the first one, and data is queued until a retransmission reaches the buffer: only in this way we can give an ordered stream to the application). When that segment arrives, this is the code that is used by TcpRxBuffer:

  Ptr<Packet> outPkt = Create<Packet> ();
  BufIterator i;
  while (extractSize)
    { // Check the buffered data for delivery
      i = m_data.begin ();
      NS_ASSERT (i->first <= m_nextRxSeq); // in-sequence data expected
      // Check if we send the whole pkt or just a partial
      uint32_t pktSize = i->second->GetSize ();
      if (pktSize <= extractSize)
        { // Whole packet is extracted
          outPkt->AddAtEnd (i->second);
          m_data.erase (i);
          m_size -= pktSize;
          m_availBytes -= pktSize;
          extractSize -= pktSize;
        }
      else
        {
           ...
        }
    }

until we don't finish the bytes we need to extract, or we don't have anymore segments to add, we addAtEnd (to the packet that will reach the application) the segment pointed by the iterator the buffer (i->second). This means that, for many segments, we call many times AddAtEnd (in my experiment, more than 2500). AddAtEnd is doing copies of the Tags, and metadata management.

In fact, the operation outPkt->AddAtEnd (i->second) is failing, only if LTE is in use. Through adding a simple print statement, I found that *every* segment inside the TcpRxBuffer contained two tags, e.g :

ns3::PdcpTag [0-1340] +16828000000.0ns ns3::RlcTag [0-1340] +16836214285.0ns

I think that these tags should be removed when the packet leaves the LTE level; but this isn't happening, since I see these tag on Tcp level. And then, I started looking into the code: what I found is that in files lte-pdcp.cc lte-rlc-am.cc lte-rlc-tm.cc lte-rlc-um.cc lte-rlc.cc these Tag are inserted as ByteTag. By design, ByteTag follows byte and cannot be removed; or, to say it better, they can be removed but all togheter. So, with this strategy, we can lose all the other tag (such as FlowMonitor or NetAnim tags).
What I propose is to add these tags as PacketTag, and removing them after the segment is received.

We see that even valgrind is complaining:

==31115== 64,080 bytes in 12 blocks are still reachable in loss record 14,849 of 14,873
==31115==    at 0x4C2AC27: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31115==    by 0xC669E4D: ns3::ByteTagList::Allocate(unsigned int) (byte-tag-list.cc:375)
==31115==    by 0xC668767: ns3::ByteTagList::Add(ns3::TypeId, unsigned int, int, int) (byte-tag-list.cc:188)
==31115==    by 0xC68B540: ns3::Packet::AddByteTag(ns3::Tag const&) const (packet.cc:809)
==31115==    by 0x59F6A48: ns3::LtePdcp::DoTransmitPdcpSdu(ns3::Ptr<ns3::Packet>) (lte-pdcp.cc:191)
==31115==    by 0x59F7EFD: ns3::LtePdcpSpecificLtePdcpSapProvider<ns3::LtePdcp>::TransmitPdcpSdu(ns3::LtePdcpSapProvider::TransmitPdcpSduParameters) (lte-pdcp-sap.h:121)
==31115==    by 0x592A78F: ns3::UeManager::SendData(unsigned char, ns3::Ptr<ns3::Packet>) (lte-enb-rrc.cc:675)
==31115==    by 0x5937461: ns3::LteEnbRrc::SendData(ns3::Ptr<ns3::Packet>) (lte-enb-rrc.cc:1807)
==31115==    by 0x5A04CEB: ns3::LteEnbNetDevice::Send(ns3::Ptr<ns3::Packet>, ns3::Address const&, unsigned short) (lte-enb-net-device.cc:348)
==31115==    by 0xC705577: ns3::PacketSocket::SendTo(ns3::Ptr<ns3::Packet>, unsigned int, ns3::Address const&) (packet-socket.cc:340)
==31115==    by 0xC7046EE: ns3::PacketSocket::Send(ns3::Ptr<ns3::Packet>, unsigned int) (packet-socket.cc:268)
==31115==    by 0xC69CF53: ns3::Socket::Send(ns3::Ptr<ns3::Packet>) (socket.cc:137)

I add the proposed patch and a test case.

Comment 1 natale.patriciello 2016-02-26 06:02:53 EST

Created attachment 2301 [details]
Use PacketTag instead of ByteTag

Comment 2 natale.patriciello 2016-02-26 06:04:15 EST

Created attachment 2302 [details]
TestCase

The testcase setup an LTE environment, with a remote host, and has a slightly modified receive function, which notify the application only when the data exceeds a 4MB boundary (in order to trigger the bug). I'd like to point out that this situation, while forced here, could happen naturally when using Tcp.

Comment 3 natale.patriciello 2016-03-01 09:14:18 EST

Afftecting another user: https://groups.google.com/forum/#!topic/ns-3-users/cJHlPx_LKkY

Comment 4 Tom Henderson 2016-05-12 14:51:39 EDT

Here is some analysis of this problem and a slightly revised patch. Patch is slightly revised to ensure that same packet tag is not added twice. All tests pass with the revised patch.

There are several tags used in the LTE stack:
- EpsBearerTag
- PdcpTag
- LtePhyTag <-- Note: this is implemented and never used; could be removed
- LteRadioBearerTag
- LteRlcSduStatusTag
- RlcTag

Most tag usage is Packet tag, but there are two cases where byte tags are used:
- PdcpTag is used to tag a timestamp at the PDCP layer
- RlcTag is used to tag a timestamp at the RLC layer
These are used to measure one-way latency at specific points in the stack.

Fragmentation and reassembly can happen at RLC layer, so this is conceivably why ByteTags were chosen for PdcpTag and RlcTag.

The basic packet flow is as follows. PdcpTag is added at PDCP layer. At RLC layer below, packet may be fragmented. In this case, both fragments get same PDCP timestamp tag. At receiver RLC layer, packets are reassembled using Packet::AddAtEnd(). At PDCP layer, the PdcpTag is read.

In current ns-3-dev code, using byte tags for this, the reassembled packet contains the right byte tags. However, the PDCP code will read the first byte tag found:

if (p->FindFirstMatchingByteTag (pdcpTag))

which will come from the first fragment.

If we change the usage to PacketTag, as proposed in the patch, the Packet::AddAtEnd() will retain only the first fragment's packet tag, and then p->PeekPacketTag (pdcpTag) will find it.

In either case, it is only the first fragment's tag that is ultimately read at PDCP layer. So, it appears that there is no harm in changing this usage (and same analysis applies to RlcTag, I believe).

In summary, as long as the timestamp of the first fragment is the one that matters, it should not matter to change the model from Byte tag to Packet tag. If this is not the case, I would argue that current usage of FindFirstMatchingByteTag() might be in error since it effectively finds the timestamp of the first fragment.

We could also consider to add some asserts to check that when reassembly occurs, the fragments being combined have the same timestamp values; I haven't checked this.

And as a small associated cleanup, we could also remove LtePhyTag from the codebase.

Comment 5 Tom Henderson 2016-05-12 14:55:35 EDT

Created attachment 2431 [details]
patch to replace ByteTag usage with PacketTag (revising Natale's original)

Comment 6 natale.patriciello 2016-05-18 11:26:11 EDT

(In reply to Tom Henderson from comment #5)
> Created attachment 2431 [details]
> patch to replace ByteTag usage with PacketTag (revising Natale's original)

A question: we have lines such as 

if (p->PeekPacketTag (pdcpTag))

if no tags are present, this works as expected, but if another tag is present (or, in other words, the last added tag is not a Pdcp one) this will produce strange effects. Am I correct?

Comment 7 Tom Henderson 2016-05-18 14:18:38 EDT

(In reply to natale.patriciello from comment #6)
> (In reply to Tom Henderson from comment #5)
> > Created attachment 2431 [details]
> > patch to replace ByteTag usage with PacketTag (revising Natale's original)
> 
> A question: we have lines such as 
> 
> if (p->PeekPacketTag (pdcpTag))
> 
> if no tags are present, this works as expected, but if another tag is
> present (or, in other words, the last added tag is not a Pdcp one) this will
> produce strange effects. Am I correct?

PeekPacketTag should peek the first such tag of the matching type, so no, I think it will not produce strange effects.

By the way, Manuel Requena took this on and will probably push a revised patch shortly.

Comment 8 Manuel Requena 2016-05-19 06:04:10 EDT

According to the doc, PeekPacketTag returns true/false if it finds the tag or not.

I'm still working in the bug because I didn't understand why the PacketTags sometimes were lost and sometimes were not lost. In fact, the segmentation process in RLC removed all the PacketTags due to how the new Packet was created with the different PDU segments. I have patched RLC (not yet pushed) and now I have bumped into bug 231 [1]. There is a problem with the PacketTags in UDP and some testcases crashes. Is this bug to be corrected soon?

[1] https://www.nsnam.org/bugzilla/show_bug.cgi?id=231

Comment 9 natale.patriciello 2016-05-19 06:49:45 EDT

Ok for PeekPacketTag (the implementation is smarter than I originally though).

For SocketAddressTag, what is exactly the problem, and why that tag is used?

Comment 10 Manuel Requena 2016-05-19 11:16:07 EDT

In the repo/changeset http://code.nsnam.org/mrequena/ns-3-dev-bug2308/rev/6983b3a4ad5f you can find the patch to change the ByteTag to PacketTag.

Now, as the RLC entities do not remove all the PacketTags from the Packet, 3 tests crashes:
lte-x2-handover, lte-x2-handover-measures and lena-x2-handover. All three crashes are in udp-socket-impl.cc (line 1039: packet->AddPacketTag (tag)) because a SocketAddressTag cannot be added twice. This is the problem of bug 231.

Comment 11 Manuel Requena 2016-06-10 11:13:23 EDT

pushed in changeset: 12149:827444bf2f5d

Comment 12 Dmitry Kovkov 2016-06-22 12:18:28 EDT

I am also working on some project involving LTE simulation, but unfortunately cannot disclose details.

Current solution is only partially correct, breaks measurements at PDCP.

1. Fix at RLC looks mostly OK, since whatever PDU leaves transmitting RLC, will arrive in the same size at receiving RLC. The correctness depends on LTE MAC implementation details. Current MAC implementation does not merge RLC PDUs, so no problems.
2. PDCP measurements are certainly broken. RLC does not only split long PDUs, it also merges small ones (adds extended header etc. in this case).
If there are splits only, no problems, PDCP measurements will be OK.
If there was any merge at RLC, PDCP timestamps will also be merged. It means if some TCP packets (PDCP SDUs) arrive consecutively, but at different times, then spend some time in RLC input queue, then this UE is given transmission opportunity big enough to transmit several SDUs, they will be merged by RLC. It means that PDCP timestamps will be 'flattened', set to the PDCP TS of the first PDCP PDU within constructed RLC PDU. After reception, every PDCP TS except the one for first SDU will be wrong. First SDU's timestamp may also be wrong, in case this SDU was carried by several RLC PDUs (split between this RLC PDU and some earlier one).
2a. If some future implementation of LTE MAC will merge RLC PDUs, same problems will arise for RLC measurements (we would need to have several active bearers to activate the bug).

I would advice to backout PDCP changes, as they certainly break PDCP measurements. RLC tags should be carefully reconsidered, maybe some more issues slipped undetected.

Comment 13 Manuel Requena 2016-07-05 04:29:02 EDT

Even if you cannot disclose details of your project, could you provide a simple example where the problem happens? How often this happens?

I guess the problem is in the measurement of the delay. Do you have some numbers about the errors introduce in the PDCP measurements?

Currently, with PacketTag there is no easy solution (Only one PacketTag is allowed per ns-3 Packet and one ns-3 Packet can contain several PDCP PDUs) and with ByteTags there is a metadata memory problem with TCP.

Comment 14 Tom Henderson 2016-07-28 11:17:30 EDT

My understanding is that we are presently stuck with this bug:
- Packet tags are not appropriate for this use case due to the packet concatenation
- Byte tags cause problems with TCP's use of AddAtEnd().

It seems that we may need a byte tag removal capability, or else to avoid use of tags for this PDCP measurement facility, or else to refactor TCP to avoid the metadata/tag limit being reached.

Comment 15 Dmitry Kovkov 2016-07-28 11:28:53 EDT

Does patch from 2069 ( https://www.nsnam.org/bugzilla/show_bug.cgi?id=2069 ) solve or lessens the problem?

Comment 16 Biljana Bojović 2016-11-17 11:12:23 EST

Is there some news about this bug? 

My understanding is that 2069 bug solves some issue in AddAtEnd function, but I do not understand if this was causing this (2308) bug? 

If yes, does it mean that we should undo the changeset: 12149:827444bf2f5d and return to usage of byte tags?

Did someone get chance to test that?

Thanks

Comment 17 natale.patriciello 2016-11-17 11:36:11 EST

(In reply to Tom Henderson from comment #14)
> - Byte tags cause problems with TCP's use of AddAtEnd().

More in general I would say that byte tags are added and never removed; this is generally dangerous.

> It seems that we may need a byte tag removal capability, or else to avoid
> use of tags for this PDCP measurement facility, or else to refactor TCP to
> avoid the metadata/tag limit being reached.

I think option 1 and 2 work best here. TCP refactoring would solve the bug use-case, not resolving the core issue.

(In reply to Dmitry Kovkov from comment #15)
> Does patch from 2069 ( https://www.nsnam.org/bugzilla/show_bug.cgi?id=2069 ) solve or lessens the problem?

Probably it could solve it (I didn't tested) but it can re-appear under other situations (e.g. an application which transmits a very large chunk of data over LTE).


So, in the end I would ask how these measurements are performed in real networks, because with ByteTags we are using "magic".

Comment 18 Tom Henderson 2016-11-17 13:04:01 EST

(In reply to Biljana Bojović from comment #16)
> Is there some news about this bug? 
> 
> My understanding is that 2069 bug solves some issue in AddAtEnd function,
> but I do not understand if this was causing this (2308) bug? 
> 
> If yes, does it mean that we should undo the changeset: 12149:827444bf2f5d
> and return to usage of byte tags?

It seems that Dmitry points out that 12149 breaks some things, not detected by our regression tests.   So if you agree, it probably needs to be reverted and another solution found.

Natale indirectly asks whether the feature can be supported without tags?

No one has a patch for enabling byte tag removal yet, as far as I know.

- Tom

Comment 19 Biljana Bojović 2016-11-17 14:27:33 EST

Natale and Tom, thanks a lot for explanations. 

I understand that we basically have at this point two options:

- (the current, 12149) The "packetTag" solution which is assert safe, but will not always provide precise PDCP measurements. 

- (old) The solution with "byteTags" which may cause asserts in some scenarios, but has correct PDCP measurements.

Since we do not have any test/example that would help us evaluate how critical is the issue with PDCP measurements, I think that is more safe to stay with 12149 until "byteTag removal" functionality is available.

Comment 20 natale.patriciello 2018-10-26 03:50:15 EDT

Hello everyone, 

I have more problem with this bug, again.
My initial statement of "Replace byteTags with packetTags" has now been superseded by a new requirement, that is the possibility to do packet fragmentation and reassembly at MAC level.

In NR (but as well it was in LTE, just no one implemented that) it is possible to construct a PDU (MAC level) that is generated by multiple packets coming from RLC. This means that the MAC will destroy all the PacketTags added by the layers above. I need this feature; if I wish to stay in sync with mainline with NR, I have to solve this problem.

As these tags are added to have the possibility to do measurements of delays between sender-receiver layers, we have two options:

- Remove the timing feature
- Remove PacketTags, adding a rule to the documentation saying that a hand will be cut in case of usage, and rewrite the ByteTags to allow removal and merging (what happens when two packets are being merged with the same byte tag). Then replace all PacketTags with ByteTags.

Added in CC Peter for the core issue, and more guys that may have a word in this discussion, I hope their fresh mind can find an intermediate solution that requires less time.

Comment 21 Tom Henderson 2018-10-26 09:12:09 EDT

(In reply to natale.patriciello from comment #20)

> 
> As these tags are added to have the possibility to do measurements of delays
> between sender-receiver layers, we have two options:
> 
> - Remove the timing feature
> - Remove PacketTags, adding a rule to the documentation saying that a hand
> will be cut in case of usage, and rewrite the ByteTags to allow removal and
> merging (what happens when two packets are being merged with the same byte
> tag). Then replace all PacketTags with ByteTags.
> 

If I understand correctly, the issue is that you need to carry a timestamp from sender to receiver at the RLC layer to support timings, and tags are proving clumsy to use for this purpose.

Just a thought, but would there be a possible solution to just explicitly carry the timestamp(s) across the APIs and in signal structures rather than use tags to carry them?  Although these are not officially part of standard APIs, you could leave a comment that they are added to support measurements.  LTE uses custom data structures to carry around packets already at the lower layers, so it seems to me that you may not be constrained to use tags.

You still have a separate issue of how to manage the semantics of these timestamps when you are merging/fragmenting these PDUs but at least you could just design something without constraint of the tag mechanism.

Comment 22 Dmitry Kovkov 2018-10-28 03:45:14 EDT

I would recommend consider adding a new removing feature to byte tags, namely, removing all bytetags of a specific typeid from a packet.

This would solve most of problems. This way, LTE adds its measurement data into byte tags in TX path, removes them altogether within RX path, before sending to upper layers.
Using additional measurement structures, as Tom suggests, would just replicate byte tag semantics (since the packets would be merged/split anyway, in several layers). I do not think this duplication is reasonable.

Comment 23 natale.patriciello 2018-10-31 05:21:12 EDT

> but would there be a possible solution to just explicitly carry the timestamp(s) across the APIs

I think in this way we will break internal projects that are based on the separation of layers based on the existing interfaces.

I will take a look on what would be the cost of removing byteTags and merging them (I believe there should be a merge: what will happen if two fragments with the same byte tag are being merged in a single packet?)