如何在Java中解析格式错误的XML? [英] How to parse badly formed XML in Java?

查看:216
本文介绍了如何在Java中解析格式错误的XML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析XML,但无法控制创建。不幸的是,它不是非常严格的XML并且包含以下内容:

I have XML that I need to parse but have no control over the creation of. Unfortunately it's not very strict XML and contains things like:

<mytag>This won't parse & contains an ampersand.</mytag>

javax.xml.stream类根本不喜欢这个,并且正确地错误:

The javax.xml.stream classes don't like this at all, and rightly error with:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[149,50]
Message: The entity name must immediately follow the '&' in the entity reference.

我该如何解决这个问题?我无法更改XML,所以我想我需要一个容错的解析器。

How can I work around this? I can't change the XML, so I guess I need an error-tolerant parser.

我的偏好是针对一个不需要太多中断的修复现有的解析器代码。

My preference would be for a fix that doesn't require too much disruption to the existing parser code.

推荐答案

如果它不是有效的XML(如上所述)那么没有XML解析器可以处理它(就像你一样)已经确定了。如果您知道错误的范围(例如上面的实体问题),那么最简单的解决方案可能是对它运行纠正过程(修复实体,如插入实体),然后将其提供给现有的解析器。

If it's not valid XML (like the above) then no XML parser will handle it (as you've identified). If you know the scope of the errors (such as the above entity issue), then the simplest solution may be to run a correcting process over it (fixing entities such as inserting entities) and then feed it to an existing parser.

否则你必须自己编写代码,内置支持这种异常。而且我无法相信这是一项繁琐且容易出错的任务。

Otherwise you'll have to code one yourself with built-in support for such anomalies. And I can't believe that's anything other than a tedious and error-prone task.

这篇关于如何在Java中解析格式错误的XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆