org.xml.sax.SAXParseException:对实体"T"的引用必须以“;"结尾定界符 [英] org.xml.sax.SAXParseException: The reference to entity "T" must end with the ';' delimiter

查看:63
本文介绍了org.xml.sax.SAXParseException:对实体"T"的引用必须以“;"结尾定界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析包含某些特殊字符(例如&")的XML文件.使用DOM解析器.我收到了saxparse异常对实体的引用必须以定界符结尾".有什么方法可以克服此异常,因为我不能修改XML文件来删除特殊字符,因为它来自不同的应用程序.请提出一种解析该XML文件以获取根元素的方法?

I am trying to parse an XML file whcih contains some special characters like "&" using DOM parser. I am getting the saxparse exception "the reference to entity must end with a a delimiter". Is there any way to overcome this exception, since I can not modify the XML file to remove the special characters, since it is coming from different application. Please suggest a way to parse this XML file to get the root element?

预先感谢

这是我正在解析的XML的一部分

This the part of the XML which I am parsing

<P>EDTA/THAM WASH 
</P>

<P>jhc ^ 72. METER SOLVENT: Meter 21 LITERS of R. O. WATER through the add line into 
FT-250. Start agitator. 
</P>

<P>R. O. WATER &lt;ZLl LITERS </P>

<P>•     NOTE: The following is a tool control operation. The area within 10 feet of any open vessel or container is under tool control. </P>

<P>-af . 73. CHARGE SOLIDS: Remove any unnecessary items from the tool controlled area. Indicate the numbers of each item that will remain in the tool controlled area during the operation in the IN box of the Tool Control Log. </P>

<P>^___y_ a. To minimize the potential for cross contamination, confirm that no other solids are being charged or packaged in adjacent equipment. </P>

<P>kk k WARNING: Wear protective gloves, air jacket and use local exhaust when handling TROMETHAMINE USP (189400) (THAM) (K-l--Irritant!). The THAM may be dusty. </P>

<P>-&lt;&amp;^b .   Charge 2.1 KG of TROMETHAMINE USP (189400) (THAM) into FT-250 through the top. </P>

<P>TROMETHAMINE USP (189400) (THAM) </P>

<P>Scale ID:     / / 7S </P>

<P>LotNo.:   qy/o^yo^ </P>

<P>Gross:    ^ . S </P>

<P>Tare: 10 ,1 </P>

<P>Net:     J^l </P>

<P>Total:   JL'J </P>

<P><Figure ActualText="&T ">

<ImageData src="images/17PT 07009K_img_1.jpg"/>
&amp;T </Figure>
Checked by </P>

推荐答案

正如其他人所述,您的XML绝对无效.但是,如果您不能更改生成的应用程序并可以添加清除步骤,则以下内容应清除XML:

As others have stated, your XML is definitely invalid. However, if you can't change the generating application and can add a cleaning step then the following should clean up the XML:

String clean = xml.replaceAll( "&([^;]+(?!(?:\\w|;)))", "&amp;$1" );

该正则表达式正在执行的工作是查找格式不正确的实体引用并转义与号.

What that regex is doing is looking for any badly formed entity references and escaping the ampersand.

具体来说,(?!(?:\\ w |;))是一个否定的前瞻,它使匹配终止于不是单词字符的任何位置(az,0-9),而不是分号.因此整个正则表达式从&那不是一个;直到第一个非单词,非分号字符为止.

Specifically, (?!(?:\\w|;)) is a negative look-ahead that makes that match stop at anything that is not a word character (a-z,0-9) and not a semi-colon. So the whole regex grabs everything from the & that is not a ; up until the first non-word, non-semi-colon character.

它将除与"号之外的所有内容都放入第一个捕获组中,以便可以在替换字符串中对其进行引用.那是1美元.

It puts everything except the ampersand in the first capture group so that it can be referred to in the replace string. That's the $1.

请注意,这不会修复看起来有效但无效的引用.例如,如果您有&T;除非XML实际定义了实体,否则这将完全引发另一种错误.

Note that this won't fix references that look like they are valid but aren't. For example, if you had &T; that would throw a different kind of error altogether unless the XML actually defines the entity.

这篇关于org.xml.sax.SAXParseException:对实体"T"的引用必须以“;"结尾定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆