Bug 2170

Summary: AnimationInterface outputs improperly formed XML
Product: ns-3 Reporter: miilic
Component: netanimAssignee: John Abraham <john.abraham.in>
Status: REOPENED ---    
Severity: normal CC: ns-bugs, riley
Priority: P5    
Version: pre-release   
Hardware: PC   
OS: Linux   
Attachments: Patch to fix XML output of AnimationInterface so that it closes all elements properly
A small patch to add node description containing XML special character to the dumbbell test case
Patch to overcome XML string encoding issues by manually escaping the critical characters
Patch that makes AnimationInterface use libxml2 for XML output, if possible

Description miilic 2015-08-22 10:36:08 EDT
Created attachment 2121 [details]
Patch to fix XML output of AnimationInterface so that it closes all elements properly

Running an example NetAnim files on ns-3.23 and on development branch creates an improperly formed XML output files. Doing

./waf --run "dumbbell-animation --nLeftLeaf=5 --nRightLeaf=5 --animFile=dumbbell.xml"
./waf --run "grid-animation --xSize=5 --ySize=5 --animFile=grid.xml"

generates dumbbell.xml and grid.xml in which only the root element ("anim") is actually closed. No other XML element is closed, only opened, causing most XML parsers to complain. For example, on Linux:

[miroslav@localhost ns-3-dev]$ xmlwf dumbbell.xml 
dumbbell.xml:18191:2: mismatched tag

[miroslav@localhost ns-3-dev]$ xmlwf grid.xml 
grid.xml:3046:2: mismatched tag

Similar messages come from opening those files in Firefox or XML Copy Editor. They are caused by closing the "anim" element when there are many child elements left unclosed. The files look like this:

<anim ver="netanim-3.106" filetype="animation" >
<node id="0" sysId="0" locX="49.5" locY="49.5" >
<node id="1" sysId="0" locX="69.3" locY="49.5" >
(...)
<p fId="10" fbTx="1.53463034" lbTx="1.53472314" tId="5" fbRx="1.53663034" lbRx="1.53672314" >
<p fId="5" fbTx="1.53672314" lbTx="1.53681594" tId="0" fbRx="1.53872314" lbRx="1.53881594" >
</anim>

Note that "node", "p" and other elements are opened but never closed by the AnimationInterface's XML output functions. Also note that the NetAnim application apparently uses a simple XML streaming parser that only fires events when first encountering an element, so it is not affected by this bug. Other potential users of these XML files would probably suffer, depending on the kind of XML parser they use.

A proposed patch is attached. With it, the tests produce well-formed XML which can be parsed by xmlwf, Firefox and XML Copy Editor correctly. I have tested that the current version of NetAnim application can also process them correctly, but have done so only on the two listed examples.
Comment 1 John Abraham 2015-08-27 21:12:01 EDT
(In reply to miilic from comment #0)
> Created attachment 2121 [details]
> Patch to fix XML output of AnimationInterface so that it closes all elements
> properly
> 
> Running an example NetAnim files on ns-3.23 and on development branch
> creates an improperly formed XML output files. Doing
> 
> ./waf --run "dumbbell-animation --nLeftLeaf=5 --nRightLeaf=5
> --animFile=dumbbell.xml"
> ./waf --run "grid-animation --xSize=5 --ySize=5 --animFile=grid.xml"
> 
> generates dumbbell.xml and grid.xml in which only the root element ("anim")
> is actually closed. No other XML element is closed, only opened, causing
> most XML parsers to complain. For example, on Linux:
> 
> [miroslav@localhost ns-3-dev]$ xmlwf dumbbell.xml 
> dumbbell.xml:18191:2: mismatched tag
> 
> [miroslav@localhost ns-3-dev]$ xmlwf grid.xml 
> grid.xml:3046:2: mismatched tag
> 
> Similar messages come from opening those files in Firefox or XML Copy
> Editor. They are caused by closing the "anim" element when there are many
> child elements left unclosed. The files look like this:
> 
> <anim ver="netanim-3.106" filetype="animation" >
> <node id="0" sysId="0" locX="49.5" locY="49.5" >
> <node id="1" sysId="0" locX="69.3" locY="49.5" >
> (...)
> <p fId="10" fbTx="1.53463034" lbTx="1.53472314" tId="5" fbRx="1.53663034"
> lbRx="1.53672314" >
> <p fId="5" fbTx="1.53672314" lbTx="1.53681594" tId="0" fbRx="1.53872314"
> lbRx="1.53881594" >
> </anim>
> 
> Note that "node", "p" and other elements are opened but never closed by the
> AnimationInterface's XML output functions. Also note that the NetAnim
> application apparently uses a simple XML streaming parser that only fires
> events when first encountering an element, so it is not affected by this
> bug. Other potential users of these XML files would probably suffer,
> depending on the kind of XML parser they use.
> 
> A proposed patch is attached. With it, the tests produce well-formed XML
> which can be parsed by xmlwf, Firefox and XML Copy Editor correctly. I have
> tested that the current version of NetAnim application can also process them
> correctly, but have done so only on the two listed examples.


Thanks I pushed it, with minor modifications.


http://code.nsnam.org/ns-3-dev/rev/1de158f7e17a

please review
Comment 2 John Abraham 2015-08-27 21:13:43 EDT
Please re-open if there are more problems in the format
Comment 3 miilic 2015-08-28 04:50:06 EDT
Looks good. Tested with the two NetAnim examples (dumbbell and grid); output is now properly formatted and can be parsed by xmlwf, XML Copy Editor and Firefox.
Comment 4 miilic 2015-09-03 08:47:52 EDT
Created attachment 2137 [details]
A small patch to add node description containing XML special character to the dumbbell test case

This adds a node description that reads "Bell's & Whistle's" to the dumbbell animation test to check XML string encoding correctness.
Comment 5 miilic 2015-09-03 09:13:15 EDT
Created attachment 2138 [details]
Patch to overcome XML string encoding issues by manually escaping the critical characters

Patch to overcome XML string encoding issues by manually escaping the critical characters when adding new attribute values to the output XML.
Comment 6 miilic 2015-09-03 09:14:51 EDT
Created attachment 2139 [details]
Patch that makes AnimationInterface use libxml2 for XML output, if possible

Patch that makes AnimationInterface use libxml2 for XML output, if possible. If libxml2 is unavailable, the code will revert to the default, manual string XML-forming approach.
Comment 7 miilic 2015-09-03 09:15:36 EDT
After some more testing, I think there are some additional issues because of which this bug should be reopened.

Scenario:
Make the AnimationInterface output a text string containing the so-called XML "predefined entities", i.e. characters that have to be escaped in well-formed XML (", ', <, >, &). See https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML or http://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents for starting references. This can be achieved by, for example, changing one of the test cases to add a Node description that contains those characters. A patch to add that to the dumbbell test is attached. Running the modified dumbbell test as usual (./waf --run "dumbbell-animation --nLeftLeaf=5 --nRightLeaf=5 --animFile=dumbbell.xml") generates an XML file which is not well-formed. Since AnimationInterface doesn't escape special XML characters, parsers will break when encountering them:
[miroslav@localhost ns-3-dev]$ xmlwf ~/Desktop/dumbbell_encoding_issue.xml 
/home/miroslav/Desktop/dumbbell_encoding_issue.xml:89:38: not well-formed (invalid token)
[miroslav@localhost ns-3-dev]$ grep Whistle ~/Desktop/dumbbell_encoding_issue.xml 
<nu p="d" t="0" id="0" descr="Bell's & Whistle's" />
We can see definitely that the problem exists because the special characters are not escaped.
Additionally, this also breaks the NetAnim GUI, in the way that it doesn't report any issues, but animation replay instantly ends. It is as if it just stopped parsing the XML when it encountered the invalid string, but didn't report it, and then there was nothing to play.

I created two different patches for this issue.

1) A simple, quick and (somewhat) dirty way updates the AnimXmlElement::AddAttribute() to escape all the critical characters. This will (obviously) work only on the attribute values, which are the most critical point, and is a bit overzealous. For example, escaping the single-quotes (') inside a double-quoted (") attribute value is not mandatory, but this approach will do it anyway. It produces a well-formed XML which the NetAnim GUI can load and play properly, with the description properly displayed.

2) A more complex, but probably more correct, way relies on the use of libxml2 XML handling library. It is already used by the ns-3 (see config-store modules). Since it is a specialized XML library, it covers the XML standard much better and is a subject to independent upstream patching. In case the library is unavailable, the patch relies on (already existing) HAVE_LIBXML2 pre-processor definition to fall back to the existing, manual string forming, approach. The fallback includes changes from the simple patch (1). Tutorial used as a basis to create the patch can be found at http://www.xmlsoft.org/examples/testWriter.c The main changes are to the AnimationInterface::AnimXmlElement class. One possible issue could be the XML writing speed for larger/more complex animation cases, although I haven't noticed any slowdown on the dumbbell example.

Both patches could probably use more testing, esp. for XML corner cases.
Comment 8 John Abraham 2015-09-06 12:14:15 EDT
Thanks for the patches. I will review them.
Comment 9 John Abraham 2015-09-13 11:13:21 EDT
(In reply to miilic from comment #5)
> Created attachment 2138 [details]
> Patch to overcome XML string encoding issues by manually escaping the
> critical characters
> 
> Patch to overcome XML string encoding issues by manually escaping the
> critical characters when adding new attribute values to the output XML.


One general comment I had was, I am more in favor of making this string replace operation 
1. optional. i.e for only string type attributes
2. optional. only if the user enables a setting globally to escape characters.

Well the reason is that, AnimationInterface already uses disk I/O which does slow down the simulation by a good amount and a character-by-character check for routine cases may be expensive in terms of simulation time.



I'm waiting for ns-3.24 to go out , to test other patches
Comment 10 John Abraham 2015-10-01 11:05:11 EDT
I pushed one part of the patch

http://code.nsnam.org/ns-3-dev/rev/5ec0a2c8aef2


I used string::iterator as it was unsafe to iterate over a pointer from c_str().

I also made the escaping operation , optional, to be used only for string type attributes