XML文件中的特殊字符 - 使用DOM API进行处理 [英] Special characters in XML files - processing with the DOM API

查看：130 发布时间：2017/6/25 3:24:00 xml dom special-characters

本文介绍了XML文件中的特殊字符 - 使用DOM API进行处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件，它是XML格式的（包括根开始和结束标签，以及根的子代）。孩子的文本元素包含&符号&在XML中，不允许有这个符号使文档有效，当我尝试使用Java中的DOM API和XML解析器处理文件时，我获得了解析错误。因此，我已经取代&与

& amp; amp; amp; amp; amp; amp; amp;< / code>，我成功处理了文件：我不得不提取不同的纯文本文件中的文本元素的值。 
 
 
 当我打开这些新创建的文本文件时，我预计会看到& amp; amp; amp; amp;< / code>，但是;代替。为什么是这样？我已经将文本存储在文本文件中，没有任何扩展名（我的XML格式的原始文件也没有.xml扩展名），我确实只有&在新文件的文本中，无论我如何打开文件：作为txt或xml文件（这些是我的XML编辑器中的一些选项）。究竟发生了什么？ Java（？）将& amp; amp; amp; amp;< / code>自动？还是有一些默认编码？那么，& amp; amp; amp;< / code>代表&，我想有一些看不见的自动转换，以下是我使用Java处理原始文件后收到的原始文件和提取的文件：
 
 
 这是我的negative.review文件，格式为XML格式：
 < review> 
< review_text> 
我不会穿它，因为它太大& amp; amp; amp;对我看起来很有趣
< / review_text> 
< / review> 
  
这是我提取的文件negative_1：
 我不会穿它，因为它太大了&对我看起来很有趣对于我来说，重要的是要保留原始数据（不进行任何转换/更换），否则， ，所以我想我必须处理提取的文件negative_1转换回& amp; amp; 到&如你所见，似乎我不必这样做。但是我不明白为什么:(。
 
 
 提前谢谢！
解决方案
 原因很简单：XML文件真的包含一个&字符。
 
 
 它只是表示不同（即它是转义），因为它自己的真正的&会打破XML文件，因为你阅读XML 1.0规范中的相关部分：2.4字符数据和标记，这只是几行，但是它解释了这个问题。
 
 
  XML是数据的表示（！），不要想它作为一个文本文件示例：
 
 
 你想将字符串17< 20存储在一个XML文件中，最初你不能，因为 <被保留为开始标签支架，因此这将是无效的：
 < xml> 17< 20& / xml> 
  
解决方案：您使用字符在特殊/保留字符之间逃脱，只是为了保留文件的有效性：
 < xml> 17 &安培; LT; 20℃/ XML> 
  
为了所有实际目的，上述代码片段包含以下数据（此次以JSON表示形式）： p> 
 
 
  {
xml：17< 20
} 
  
这就是为什么你在后期处理中看到真正的& 。它已经以同样的方式逃脱了，但是它始终保持不变。
 
 
 上面的例子也解释了为什么&必须特别对待：它本身是XML转义机制的一部分。它标志着转义序列的开始，就像在& lt中一样。因此，它必须自行转义（如& amp;，就像您所做的那样）。
 
I have a file, which is in XML format (consists just of root start and end tags, and children of the root). The text elements of the children contain the ampersand symbol &. In XML it is not allowed to have this symbol in order the document to be valid, and when I tried to process the file using the DOM API in Java and an XML parser, I obtained parsing errors. Therefore, I have replaced & with &amp;, and I processed the file successfully: I had to extract the values of the text elements in different plain text files. 

When I opened these newly created text files, I expected to see &amp;, but there was & instead. Why is this? I have stored the text in text files without any extension (my original file with the XML format also did not have .xml extension), and I do have just & in the text of the new file, no matter how I open the file: as txt or as xml file (these are some of the options in my XML editor). What happens exactly? Does Java (?) convert &amp; to & automatically? Or there is some default encoding? Well, &amp; stands for &, and I suppose there is some "invisible" automatic conversion, but I am confused when and how this happens. Here are examples of my original file and the extracted file which I receive after I processed the original file with Java:

This is my "negative.review" file in XML format:
<review>
<review_text>
I will not wear it as it is too big &amp; looks funny on me. 
</review_text>
</review>
This is my extracted file "negative_1":
I will not wear it as it is too big & looks funny on me. 
For me it is important to have the original data as it is (without doing any conversions/replacements), so I thought that I have to process the extracted file "negative_1" converting back &amp; to &. As you see, it seems I don't have to do this. But I don't understand why :(. 

Thank you in advance!
 解决方案 
The reason is simple: The XML file really contains an "&" character.

It is just represented differently (i.e. it is "escaped"), because a real "&" on it's own breaks XML files, as you've seen. Read the relevant section in the XML 1.0 spec: "2.4 Character Data and Markup". It's just a few lines, but it explains the issue quite well.

XML is a representation of data (!). Don't think of it as a text file. Example:

You want to store the string "17 < 20" in an XML file. Initially, you can't, since the "<" is reserved as the opening tag bracket. So this would be invalid:
<xml>17 < 20</xml>
Solution: You employ character escaping on the special/reserved character, just for the means of retaining the validity of the file:
<xml>17 &lt; 20</xml>
For all practical purposes the above snippet contains the following data (in JSON representation this time):
{
  "xml": "17 < 20"
}
This is why you see the real "&" in your post-processing. It had been escaped in just the same way, but it's meaning stayed the same all the time.

The above example also explains why the "&" must be treated specially: It is itself part of the XML escaping mechanism. It marks the start of an escape sequence, like in "&lt;". Therefore it must be escaped itself (with "&amp;", like you've done).

                        这篇关于XML文件中的特殊字符 - 使用DOM API进行处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
                        
                    
                    
                        查看全文

XML文件中的特殊字符 - 使用DOM API进行处理 [英] Special characters in XML files - processing with the DOM API

问题描述

相关文章

JavaScript最新文章

热门教程

热门工具

登录关闭

XML文件中的特殊字符 - 使用DOM API进行处理 [英] Special characters in XML files - processing with the DOM API

问题描述

相关文章

JavaScript最新文章

热门教程

热门工具

登录 关闭

登录关闭