如何在巨大的XML文件中进行命令行XPath查询? [英] How to do command line XPath queries in huge XML files?

查看:119
本文介绍了如何在巨大的XML文件中进行命令行XPath查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件集合,其中一些文件很大(多达约5000万个元素节点).我正在使用xmllint来验证那些文件,由于使用了流API,即使对于很大的文件也可以很好地工作.

I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.

xmllint --loaddtd --stream --valid /path/to/huge.xml

我最近了解到xmllint还能够执行命令行XPath查询,这非常方便.

I recently learned that xmllint is also capable of doing command line XPath queries, which is very handy.

xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml

但是,这些XPath查询不适用于巨大的XML文件.一段时间后,我刚刚收到已杀死"消息.我试图启用流式API,但这根本没有任何输出.

However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.

xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml

使用xmllint进行XPath查询时,是否可以启用流模式?还有其他/更好的方法可以对大型XML文件执行命令行XPath查询吗?

Is there a way to enable streaming mode when doing XPath queries using xmllint? Are there other/better ways to do command line XPath queries for huge XML files?

推荐答案

如果您的XPath表达式非常简单,请尝试 xmlcutty .

If your XPath expressions are very simple, try xmlcutty.

在主页上:

xmlcutty是一个简单的工具,可以快速地从大型XML文件中裁剪出元素.由于它以流方式工作,因此几乎不占用内存,并且每分钟可以处理大约1G的XML.

xmlcutty is a simple tool for carving out elements from large XML files, fast. Since it works in a streaming fashion, it uses almost no memory and can process around 1G of XML per minute.

这篇关于如何在巨大的XML文件中进行命令行XPath查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆