Java XML解析和原始字节偏移 [英] Java XML Parsing and original byte offsets

查看:124
本文介绍了Java XML解析和原始字节偏移的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一些结构良好的XML解析为DOM,但我想知道原始媒体中每个节点标记的偏移量。

I'd like to parse some well-formed XML into a DOM, but I'd like know the offset of each node's tag in the original media.

例如,如果我的XML文档的内容类似于:

For example, if I had an XML document with the content something like:

<html>
<body>
<div>text</div>
</body>
</html>

我想知道节点从原始媒体中的偏移量13开始,并且(更多重要的是text从偏移量18开始。

I'd like to know that the node starts at offset 13 in the original media, and (more importantly) that "text" starts at offset 18.

这是否可以使用标准的Java XML解析器? JAXB?如果没有容易获得的解决方案,那么在解析路径上需要进行哪些类型的更改才能实现这一点?

Is this possible with standard Java XML parsers? JAXB? If no solution is easily available, what type of changes are necessary along the parsing path to make this possible?

推荐答案

SAX API为此提供了一个相当模糊的机制 - org.xml.sax.Locator 界面。当您使用SAX API时,您继承 DefaultHandler 并将其传递给SAX解析方法,并且SAX解析器实现应该注入 Locator 通过 setDocumentLocator()进入 DefaultHandler 。随着解析的进行,调用 ContentHandler 上的各种回调方法(例如 startElement()),此时你可以参考定位器找出解析位置(通过 getColumnNumber() getLineNumber ()

The SAX API provides a rather obscure mechanism for this - the org.xml.sax.Locator interface. When you use the SAX API, you subclass DefaultHandler and pass that to the SAX parse methods, and the SAX parser implementation is supposed to inject a Locator into your DefaultHandler via setDocumentLocator(). As the parsing proceeds, the various callback methods on your ContentHandler are invoked (e.g. startElement()), at which point you can consult the Locator to find out the parsing position (via getColumnNumber() and getLineNumber())

从技术上讲,这是可选功能,但javadoc说强烈鼓励提供实现,所以你可以可能假设内置于JavaSE中的SAX解析器会这样做。

Technically, this is optional functionality, but the javadoc says that implementations are "strongly encouraged" to provide it, so you can likely assume the SAX parser built into JavaSE will do it.

当然,这确实意味着使用SAX API,这是没有趣味的想法,但我不能查看使用更高级API访问此信息的方法。

Of course, this does mean using the SAX API, which is noone's idea of fun, but I can't see a way of accessing this information using a higher-level API.

编辑:找到这个例子

这篇关于Java XML解析和原始字节偏移的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆