并发SAX处理大型，简单的XML文件？ [英] Concurrent SAX processing of large, simple XML files?

查看：159 发布时间：2016/12/26 21:27:11 python xml parsing concurrency sax

本文介绍了并发SAX处理大型，简单的XML文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有几个巨大的XML文件（10GB-40GB），它们有一个非常简单的结构：只有一个包含多个行节点的根节点。我试图使用SAX在Python中解析他们，但我必须为每一行做额外的处理意味着40GB文件需要一整天的时间来完成。为了加快速度，我想同时使用我所有的核心。不幸的是，似乎SAX解析器无法处理畸形的XML块，这是你得到当你寻找到文件中的任意行，并尝试从那里解析。由于SAX解析器可以接受流，我想我需要将我的XML文件分为八个不同的流，每个包含[行数] / 8行，并填充假开关标签。我怎么会这样做？或者 - 有没有更好的解决方案，我可能会失踪？谢谢！

I have a couple of gigantic XML files (10GB-40GB) that have a very simple structure: just a single root node containing multiple row nodes. I'm trying to parse them using SAX in Python, but the extra processing I have to do for each row means that the 40GB file takes an entire day to complete. To speed things up, I'd like to use all my cores simultaneously. Unfortunately, it seems that the SAX parser can't deal with "malformed" chunks of XML, which is what you get when you seek to an arbitrary line in the file and try parsing from there. Since the SAX parser can accept a stream, I think I need to divide my XML file into eight different streams, each containing [number of rows]/8 rows and padded with fake opening and closing tags. How would I go about doing this? Or — is there a better solution that I might be missing? Thank you!

并发SAX处理大型，简单的XML文件？ [英] Concurrent SAX processing of large, simple XML files?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

并发SAX处理大型，简单的XML文件？ [英] Concurrent SAX processing of large, simple XML files?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭