Quantcast
Channel: When parsing Atom XML feeds, how should conflicting CDATA and entity escaped elements be handled? - Stack Overflow
Viewing all articles
Browse latest Browse all 4

When parsing Atom XML feeds, how should conflicting CDATA and entity escaped elements be handled?

$
0
0

How should an Atom feed parser handle the following line of XML in a feed:

<title type="html"><![CDATA[Johnson &amp; Johnson]]></title>

For the sake of the discussion, lets assume that the originally intended text was in fact Johnson & Johnson. I came across this online discussion about this issue and there seemed to be 2 different opinions:

  1. Opinion #1 - claims that this content is double-encoded. The text "Johnson & Johnson" text has been entity escaped and then encoded again by being wrapped in a CDATA section. He states that a well behaved xml parser will return Johnson &amp; Johnson, because this is how the XML spec states CDATA encoded data should be handled.

  2. Opinion #2 - claims that the Atom spec takes precedent. He states that the CDATA acts as a passthrough. Johnson &amp; Johnson comes out as Johnson &amp; Johnson. If this were just an XML document, it would end there. However, because it is Atom, we must then look at the Atom spec to determine the proper behavior. The atom spec states that any element with the type="html" contains entity escaped html. Therefore, we should be free to decode it.

Which of these factually correct? Should a proper Atom XML parser produce:Johnson & Johnson or Johnson &amp; Johnson given this particular situation?


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images