JAVA:使用 XmlStreamReader 收集 xml 标签的字节偏移量 [英] JAVA: gathering byte offsets of xml tags using an XmlStreamReader

查看:40
本文介绍了JAVA:使用 XmlStreamReader 收集 xml 标签的字节偏移量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法使用 XMLStreamReader 准确地收集 xml 标签的字节偏移量?

Is there a way to accurately gather the byte offsets of xml tags using the XMLStreamReader?

我有一个需要随机访问的大型 xml 文件.我不想将整个内容写入数据库,而是希望使用 XMLStreamReader 运行一次以收集重要标签的字节偏移量,然后能够使用 RandomAccessFile 稍后检索标签内容.

I have a large xml file that I require random access to. Rather than writing the whole thing to a database, I would like to run through it once with an XMLStreamReader to gather the byte offsets of significant tags, and then be able to use a RandomAccessFile to retrieve the tag content later.

XMLStreamReader 似乎没有办法跟踪字符偏移.相反,人们建议将 XmlStreamReader 附加到跟踪已读取字节数的读取器(例如 apache.commons.io 提供的 CountingInputStream)

XMLStreamReader doesn't seem to have a way to track character offsets. Instead people recommend attaching the XmlStreamReader to a reader that tracks how many bytes have been read (the CountingInputStream provided by apache.commons.io, for example)

例如:

CountingInputStream countingReader = new CountingInputStream(new FileInputStream(xmlFile)) ;
XMLStreamReader xmlStreamReader = xmlStreamFactory.createXMLStreamReader(countingReader, "UTF-8") ;


while (xmlStreamReader.hasNext()) {
    int eventCode = xmlStreamReader.next();

    switch (eventCode) {
        case XMLStreamReader.END_ELEMENT :
            System.out.println(xmlStreamReader.getLocalName() + " @" + countingReader.getByteCount()) ;
    }

}
xmlStreamReader.close();

不幸的是,一定有一些缓冲正在进行,因为上面的代码打印出几个标签的相同字节偏移量.是否有更准确的方法来跟踪 xml 文件中的字节偏移量(理想情况下不放弃正确的 xml 解析)?

Unfortunately there must be some buffering going on, because the above code prints out the same byte offsets for several tags. Is there a more accurate way of tracking byte offsets in xml files (ideally without resorting to abandoning proper xml parsing)?

推荐答案

您可以在 XMLStreamReader 上使用 getLocation()(或 XMLEvent.getLocation(),如果您使用 XMLEventReader),但我记得在某处读到它不可靠并且精确的.看起来它给出了标签的端点,而不是起始位置.

You could use getLocation() on the XMLStreamReader (or XMLEvent.getLocation() if you use XMLEventReader), but I remember reading somewhere that it is not reliable and precise. And it looks like it gives the endpoint of the tag, not the starting location.

我也有类似的需求,需要精确地知道标签在文件中的位置,我正在查看其他解析器,看看是否有一种可以保证提供必要级别的位置精度的解析器.

I have a similar need to precisely know the location of tags within a file, and I'm looking at other parsers to see if there is one that guarantees to give the necessary level of location precision.

这篇关于JAVA:使用 XmlStreamReader 收集 xml 标签的字节偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆