是否使用SaxParser解析XML文档-2047个字符限制? [英] Issue Parsing XML Document using SaxParser - 2047 character limit?

查看:83
本文介绍了是否使用SaxParser解析XML文档-2047个字符限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个扩展SaxParser DefaultHandler类的类.我的目的是将XML输入存储在一系列对象中,同时保留原始XML数据的数据完整性.在测试期间,我注意到一些节点数据在输入时被任意截断.

I have created a class that extends the SaxParser DefaultHandler class. My intent is to store the XML input in a series of objects while preserving the data integrity of the original XML data. During testing, I notice that some of the node data was being truncated arbitrarily on input.

例如:

Input: <temperature>-125</temperature>  Output: <sensitivity>5</sensitivity>
Input: <address>101_State</city>             Output: <address>te</address> 

为了使事情更加复杂,在相同XML标记的每100个实例中,有1个以上的错误随机"发生.这意味着输入XML文件大约包含100个包含<temperature>-125</temperature>的标签,但是只有其中一个会产生<sensitivity>5</sensitivity>的输出.其他标签准确地生成<sensitivity>-125</sensitivity>.

To further complicate things, the above errors occurs "randomly" for 1 out of every ~100 instances of the same XML tags. Meaning the input XML file has roughly 100 tags that contain <temperature>-125</temperature> but only one of them produces an output of <sensitivity>5</sensitivity>. The other tags accurately produce <sensitivity>-125</sensitivity>.

我已经覆盖了抽象的"characters(char [] ch,int start,int length)"方法,以简单地获取XML标签之间的字符内容:

I have overwritten the abstract "characters(char[] ch, int start, int length)" method to simple grab the character content between XML tags:

public void characters(char[] ch, int start, int length)
            throws SAXException {

            value = new String(ch, start, length);

            //debug
            System.out.println("'" + value + "'" + "start: " + start + "length: " + length);
        }

我的println语句为特定的温度标签生成以下输出,从而导致错误的输出:

My println statements produce the following output for the specific temperature tag that results in erroneous output :

> '-12'start: 2045length: 3 '5'start:
> 0length: 1

这告诉我针对此特定xml元素,两次调用了character方法.所有其他xml标签都被调用一次. secong行的开始"值向我表明,在此XML标记的中间,正在重置char [] chars.然后使用新的char []再次调用character方法.

This tells me that the characters methods is being called twice for this specific xml element. It is being called once for all other xml tags. The "start" value of the secong line signifies to me that the char[] chars is being reset in the middle of this XML tag. And the character method is being called again with the new char [].

有人知道这个问题吗?我想知道我是否达到了char []的容量极限.但是快速查询使这种情况不太可能发生.我的char []似乎重置为〜2047个字符

Is anyone familiar with this issue? I was wondering if I was reaching the limit of a char []'s capacity. But a quick query renders this unlikely. My char [] seems to be resetting at ~ 2047 characters

谢谢

LB

推荐答案

The characters callback method need not be provided with a complete chunk of data by the SAX Parser. The parser could invoke the characters() method multiple times, sending a chunk of data at a time.

解决方案是将所有数据累积在缓冲区中,直到下一次调用发生到另一个方法(非字符调用)为止.

The resolution is to accumulate all the data in a buffer, until the next call happens to another method (a non-characters call).

这篇关于是否使用SaxParser解析XML文档-2047个字符限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆