Java XML解析和原始字节偏移 [英] Java XML Parsing and original byte offsets
问题描述
我想将一些结构良好的XML解析为DOM,但我想知道原始媒体中每个节点标记的偏移量。
I'd like to parse some well-formed XML into a DOM, but I'd like know the offset of each node's tag in the original media.
例如,如果我的XML文档的内容类似于:
For example, if I had an XML document with the content something like:
<html>
<body>
<div>text</div>
</body>
</html>
我想知道节点从原始媒体中的偏移量13开始,并且(更多重要的是text从偏移量18开始。
I'd like to know that the node starts at offset 13 in the original media, and (more importantly) that "text" starts at offset 18.
这是否可以使用标准的Java XML解析器? JAXB?如果没有容易获得的解决方案,那么在解析路径上需要进行哪些类型的更改才能实现这一点?
Is this possible with standard Java XML parsers? JAXB? If no solution is easily available, what type of changes are necessary along the parsing path to make this possible?
推荐答案
SAX API为此提供了一个相当模糊的机制 - org.xml.sax.Locator
界面。当您使用SAX API时,您继承 DefaultHandler
并将其传递给SAX解析方法,并且SAX解析器实现应该注入 Locator
通过 setDocumentLocator()
进入 DefaultHandler
。随着解析的进行,调用 ContentHandler
上的各种回调方法(例如 startElement()
),此时你可以参考定位器
找出解析位置(通过 getColumnNumber()
和 getLineNumber ()
)
The SAX API provides a rather obscure mechanism for this - the org.xml.sax.Locator
interface. When you use the SAX API, you subclass DefaultHandler
and pass that to the SAX parse methods, and the SAX parser implementation is supposed to inject a Locator
into your DefaultHandler
via setDocumentLocator()
. As the parsing proceeds, the various callback methods on your ContentHandler
are invoked (e.g. startElement()
), at which point you can consult the Locator
to find out the parsing position (via getColumnNumber()
and getLineNumber()
)
从技术上讲,这是可选功能,但javadoc说强烈鼓励提供实现,所以你可以可能假设内置于JavaSE中的SAX解析器会这样做。
Technically, this is optional functionality, but the javadoc says that implementations are "strongly encouraged" to provide it, so you can likely assume the SAX parser built into JavaSE will do it.
当然,这确实意味着使用SAX API,这是没有趣味的想法,但我不能查看使用更高级API访问此信息的方法。
Of course, this does mean using the SAX API, which is noone's idea of fun, but I can't see a way of accessing this information using a higher-level API.
编辑:找到这个例子。
这篇关于Java XML解析和原始字节偏移的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!