排序千兆字节的xml文件 [英] Sort multigigabyte xml file

查看:89
本文介绍了排序千兆字节的xml文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何按字母顺序对千兆字节xml文件中的所有标签进行排序,所有相等的标签也应按属性排序? 相关问题中建议的所有方法都无法处理如此大的数据.

我正在寻找适用于Windows或Linux的现有工具.

解决方案

最初的目标是能够与包含相似数据的超大型xml进行比较,但是我最终以不同的顺序将xml分为逻辑块(每个xml包含成千上万个已处理的文档,因此将其拆分,以便使用csplit实用程序将每个文档放入单独的文件中),然后比较两个xml中每对等大小的文档(幸运的是,一个xml中没有大小相等的文档). >

不是完美的解决方案,但可以在合理的时间和空间限制下使用

How to sort all tags in multigigabyte xml file alphabetically, all equal tags should also be sorted by attributes? All methods suggested in related questions fail for such large data.

I'm looking for existing tools for Windows or Linux.

解决方案

As the original goal was to be able to compare to extremely large xmls which contained similar data but in different order I ended up doing splitting xmls in logical chunks (each xml contained thousands of processed documents, and it was split so each document went into separate file with csplit utility), and then compared each pair of equally size documents from two xmls (luckily there were no equally sized documents within one xml).

Not perfect solution but it worked withing reasonable time and space constraints

这篇关于排序千兆字节的xml文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆