Java中的并行XML解析 [英] Parallel XML Parsing in Java

查看：59 发布时间：2020/5/13 21:25:19 java xml multithreading parallel-processing xml-parsing

本文介绍了Java中的并行XML解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个应用程序，该应用程序处理具有深节点结构的许多xml文件(> 1000).使用 woodstox (事件API)大约需要六秒钟，以解析具有22.000个节点的文件.

I'm writing an application which processes a lot of xml files (>1000) with deep node structures. It takes about six seconds with with woodstox (Event API) to parse a file with 22.000 Nodes.

将该算法置于与用户交互的过程中，其中仅几秒钟的响应时间是可接受的.因此，我需要改进如何处理xml文件的策略.

The algorithm is placed in a process with user interaction where only a few seconds response time are acceptable. So I need to improve the strategy how to handle the xml files.

我的过程分析xml文件(仅提取几个节点).
处理提取的节点，并将新结果写入新的数据流(产生具有修改后的节点的文档副本).

现在，我正在考虑一种多线程解决方案(在16个Core +硬件上可更好地扩展).我想到了以下策略:

Now I'm thinking about a multithreaded solution (which scales better on 16 Core+ hardware). I thought about the following stategies:

创建多个解析器并在xml源上并行运行它们.
重写我的解析算法以节省线程以仅使用解析器的一个实例(工厂，...)
将XML源拆分为多个块，并将这些块分配给多个处理线程(

Creating multiple parsers and running them in parallel on the xml sources.
Rewriting my parsing algorithm thread-save to use only one instance of the parser (factories, ...)
Split the XML source into chunks and assign the chunks to multiple processing threads (map-reduce xml - serial)
Optimizing my algorithm (better StAX parser than woodstox?) / Using a parser with build-in concurrency

我想同时提高整体性能和每个文件"的性能.

您有处理此类问题的经验吗?最好的方法是什么?

Do you have experience with such problems? What is the best way to go?

推荐答案

这很明显:只需创建多个解析器，然后在多个线程中并行运行它们即可.

This one is obvious: just create several parsers and run them in parallel in multiple threads.

看看 Woodstox性能(目前不支持，请尝试使用Google缓存).

Take a look at Woodstox Performance (down at the moment, try google cache).

如果XML的结构是可预测的，则可以做到这一点:如果它具有许多相同的顶级元素.例如:

This can be done IF structure of your XML is predictable: if it has a lot of same top-level elements. For instance:

<element>
    <more>more elements</more>
</element> 
<element>
    <other>other elements</other>
</element>

在这种情况下，您可以创建简单的拆分器，该拆分器搜索<element>并将该部分提供给特定的解析器实例.这是一种简化的方法:在现实生活中，我将使用RandomAccessFile查找起始停止点(<element>)，然后创建仅对文件的一部分进行操作的自定义FileInputStream.

In this case you could create simple splitter that searches <element> and feeds this part to a particular parser instance. That's a simplified approach: in real life I'd go with RandomAccessFile to find start stop points (<element>) and then create custom FileInputStream that just operates on a part of file.

看看 Aalto .创造了伍德斯托克斯的那个家伙.这是该领域的专家-不要重新发明轮子.

Take a look at Aalto. The same guys that created Woodstox. This are experts in this area - don't reinvent the wheel.

这篇关于Java中的并行XML解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java中的并行XML解析 [英] Parallel XML Parsing in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java中的并行XML解析 [英] Parallel XML Parsing in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭