Java:从xml中删除cdata标记 [英] java: remove cdata tag from xml

查看:975
本文介绍了Java:从xml中删除cdata标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

xpath非常适合解析xml文件,但不适用于cdata标记内的数据:

xpath is nice for parsing xml files, but its not working for data inside the cdata tag:

<![CDATA[ Some Text <p>more text and tags</p>... ]]>

我的解决方案:首先获取xml的内容并删除

My solution: Get the content of the xml first and remove

"<![CDATA["  and  "]]>".

在那之后,我将运行xpath以从XML文件中获取所有内容。有更好的解决方案吗?如果没有,我该如何使用正则表达式呢?

After that I would run xpath "to reach everything" from the xml file. Is there a better solution? If not, how can I do it with a regular expression?

推荐答案

使用CDATA标记的原因是其中的所有内容是纯文本,不应将任何内容直接解释为XML。您也可以将问题的文档片段写成

The reason for the CDATA tags there is that everything inside them is pure text, nothing which should be interpreted directly as XML. You could write your document fragment in the question alternatively as

 Some Text &lt;p&gt;more text and tags&lt;/p&gt;... 

(带前导

如果您真的想将其解释为XML,请从文档中提取文本,然后再次提交给XML解析器。

If you really want to interpret this as XML, extract the text from your document, and submit it to an XML parser again.

这篇关于Java:从xml中删除cdata标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆