如何在巨大的XML文件中进行命令行XPath查询? [英] How to do command line XPath queries in huge XML files?
问题描述
我有一个XML文件集合,其中一些文件很大(多达约5000万个元素节点).我正在使用xmllint
来验证那些文件,由于使用了流API,即使对于很大的文件也可以很好地工作.
I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint
for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.
xmllint --loaddtd --stream --valid /path/to/huge.xml
我最近了解到xmllint
还能够执行命令行XPath查询,这非常方便.
I recently learned that xmllint
is also capable of doing command line XPath queries, which is very handy.
xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml
但是,这些XPath查询不适用于巨大的XML文件.一段时间后,我刚刚收到已杀死"消息.我试图启用流式API,但这根本没有任何输出.
However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.
xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml
使用xmllint
进行XPath查询时,是否可以启用流模式?还有其他/更好的方法可以对大型XML文件执行命令行XPath查询吗?
Is there a way to enable streaming mode when doing XPath queries using xmllint
? Are there other/better ways to do command line XPath queries for huge XML files?
推荐答案
如果您的XPath表达式非常简单,请尝试 xmlcutty .
If your XPath expressions are very simple, try xmlcutty.
在主页上:
xmlcutty是一个简单的工具,可以快速地从大型XML文件中裁剪出元素.由于它以流方式工作,因此几乎不占用内存,并且每分钟可以处理大约1G的XML.
xmlcutty is a simple tool for carving out elements from large XML files, fast. Since it works in a streaming fashion, it uses almost no memory and can process around 1G of XML per minute.
这篇关于如何在巨大的XML文件中进行命令行XPath查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!