在 Python 中拆分大型 XML 文件 [英] Splitting a large XML file in Python

查看:37
本文介绍了在 Python 中拆分大型 XML 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望将一个巨大的 XML 文件拆分成更小的部分.我想扫描文件以查找特定标签,然后获取 和 之间的所有信息,然后将其保存到文件中,然后继续浏览文件的其余部分.

I'm looking to split a huge XML file into smaller bits. I'd like to scan through the file looking for a specific tag, then grab all info between and , then save that into a file, then continue on through the rest of the file.

我的问题是试图找到一种干净的方式来记录标签的开始和结束,以便我可以在使用for line in f"扫描文件时抓取里面的文本

My issue is trying to find a clean way to note the start and end of the tags, so that I can grab the text inside as I scan through the file with "for line in f"

我宁愿不使用哨兵变量.有没有一种pythonic的方法来完成这项工作?

I'd rather not use sentinel variables. Is there a pythonic way to get this done?

文件太大,无法读入内存.

The file is too big to read into memory.

推荐答案

处理 XML 数据的常用方法有两种.

There are two common ways to handle XML data.

一个叫做DOM,它代表文档对象模型.这种 XML 解析风格可能是您在查看文档时所看到的,因为它将整个 XML 读入内存以创建对象模型.

One is called DOM, which stands for Document Object Model. This style of XML parsing is probably what you have seen when looking at documentation, because it reads the entire XML into memory to create the object model.

第二种叫做SAX,是一种流方法.解析器开始读取 XML 并向您的代码发送有关某些事件的信号,例如当发现新的开始标签时.

The second is called SAX, which is a streaming method. The parser starts reading the XML and sends signals to your code about certain events, e.g. when a new start tag is found.

所以 SAX 显然是您的情况所需要的.Sax 解析器可以在 xml 下的 python 库中找到.saxxml.parsers.外籍人士.

So SAX is clearly what you need for your situation. Sax parsers can be found in the python library under xml.sax and xml.parsers.expat.

这篇关于在 Python 中拆分大型 XML 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆