从套接字读取大块 xml 数据并动态解析 [英] Reading big chunk of xml data from socket and parse on the fly

查看:24
本文介绍了从套接字读取大块 xml 数据并动态解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个 android 客户端,它通过 TCP 套接字从我的 java 服务器读取连续的 xml 数据流.服务器发送一个 ' ' 字符作为连续响应之间的分隔符.下面给出了一个模型实现..

I am working on an android client which reads continues stream of xml data from my java server via a TCP socket. The server sends a ' ' character as delimiter between consecutive responses. Below given is a model implementation..

<response1>
   <datas>
      <data>
           .....
           .....
      </data>
      <data>
           .....
           .....
      </data>
      ........
      ........
   </datas>
</response1>
    <--- 
 acts as delimiter ---/> 
<response2>

   <datas>
      <data>
           .....
           .....
      </data>
      <data>
           .....
           .....
      </data>
      ........
      ........
   </datas>
</response2>

好吧,我希望现在结构已经清楚了.这个响应是从服务器 zlib 压缩传输的.所以我必须首先膨胀我从服务器读取的任何内容,使用分隔符和解析分离响应.我正在使用 SAX 来解析我的 XML

Well I hope the structure is clear now. This response is transmitted from server zlib compressed. So I have to first inflate whatever I am reading from the server, separate on response using delimiter and parse. And I am using SAX to parse my XML

现在我的主要问题是来自服务器的 xml 响应可能非常大(可能在 3 到 4 MB 的范围内).所以

Now my main problem is the xml response coming from server can be very large (can be in the range of 3 to 4 MB). So

  • 要根据分隔符 ( ) 分隔响应,我必须使用stringBuilder 存储从套接字读取的响应块并且在某些手机上 StringBuilder 无法将字符串存储在兆字节范围.它给出了 OutOfMemory 异常,并且来自线程像 this 我知道保持大字符串(即使在临时基础)不是一个好主意.

  • to separate responses based on delimiter ( ) I have to use a stringBuilder to store response blocks as it reads from socket and on some phones StringBuilder cannot store strings in the MegaBytes range. It is giving OutOfMemory exception, and from threads like this I got to know keeping large strings (even on a temporary basis) is not such a good idea.

接下来,我尝试传递 inflatorReadStream(这反过来又需要数据从套接字输入流)作为 SAX 解析器的输入流(没有费心自己分离 xml 并依靠 SAX 的能力找到基于标签的文档结尾).这一次得到了一个回应解析成功,但随后找到 ' ' 分隔符 SAX在文档之后抛出 ExpatParserParseException垃圾元素 .

Next I tried to pass the inflatorReadStream (which in turn takes data from socket input stream) as the input stream of SAX parser (without bothering to separate xml myself and relying on SAX's ability to find the end of document based on tags). This time one response gets parsed successfully, but then on finding the ' ' delimiter SAX throws ExpatParserParseException saying junk after document element .

下面给出了我所做的代码片段(为了清楚起见,删除了所有不相关的 try catch 块).

A code snippet of what I have done is given below (removed all unrelated try catch blocks for clarity).

private Socket clientSocket     =   null;
DataInputStream readStream      =   null;
DataOutputStream writeStream        =   null;
private StringBuilder incompleteResponse    =   null;
private AppContext  context     =   null;


public boolean connectToHost(String ipAddress, int port,AppContext myContext){
        context                     =   myContext;
        website                     =   site;
        InetAddress serverAddr          =   null;

    serverAddr                      =   InetAddress.getByName(website.mIpAddress);

    clientSocket                    =   new Socket(serverAddr, port);

    //If connected create a read and write Stream objects..
    readStream   =  new DataInputStream(new InflaterInputStream(clientSocket.getInputStream()));
    writeStream             =   new DataOutputStream(clientSocket.getOutputStream());

    Thread readThread = new Thread(){
            @Override
            public void run(){                              
            ReadFromSocket();                   
        }
    };
    readThread.start();     
    return true;
}


public void ReadFromSocket(){
   while(true){
       InputSource xmlInputSource = new InputSource(readStream);
       SAXParserFactory spf =   SAXParserFactory.newInstance();
       SAXParser sp =   null;
       XMLReader xr =   null;
       try{
           sp   = spf.newSAXParser();
       xr   = sp.getXMLReader();
       ParseHandler xmlHandler =    new ParseHandler(context.getSiteListArray().indexOf(website), context);
       xr.setContentHandler(xmlHandler);
       xr.parse(xmlInputSource);
   //  postSuccessfullParsingNotification();
       }catch(SAXException e){
           e.printStackTrace();
           postSuccessfullParsingNotification();
       }catch(ParserConfigurationException e){
           e.printStackTrace();
           postSocketDisconnectionBroadcast();
           break;
       }catch (IOException e){
           postSocketDisconnectionBroadcast();
           e.printStackTrace();
           e.toString();
           break;
       }catch (Exception e){
           postSocketDisconnectionBroadcast();
           e.printStackTrace();
           break;
       }
    }
}

现在我的问题是

  1. 有什么办法可以让 SAX 解析器忽略垃圾字符xml 响应,而不是抛出异常并关闭流..
  2. 如果没有,有什么办法可以避免内存不足错误字符串生成器.坦率地说,我不排除对这.有什么解决方法吗?

推荐答案

  1. 您也许可以在读取器或流周围使用包装器,您传递给检测换行符的过滤器,然后关闭解析器并启动一个继续处理流的新解析器:您的流不是有效的 XML,您赢了' 无法像您目前实施的那样解析它.看看 http://commons.apache.org/io/api-release/org/apache/commons/io/input/CloseShieldInputStream.html.
  2. 没有

这篇关于从套接字读取大块 xml 数据并动态解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆