如何最好地使用XPath在.NET中使用非常大的XML文件? [英] How best to use XPath with very large XML files in .NET?

查看:144
本文介绍了如何最好地使用XPath在.NET中使用非常大的XML文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要做相当大的XML文件的一些处理(大在这里被潜在地向上一千兆字节)在C#包括执行一些复杂的XPath查询。我的问题是,标准的方式,我将通过对System.Xml库通常这样做喜欢整个文件加载到内存中做任何事情与它,这可能会导致内存问题与此大小的文件了。

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size.

我并不需要被更新的文件,在一切只是读取它们和查询包含在其中的数据。一些XPath查询是相当棘手的,去跨越几个层次的亲子类型的关系 - 我不知道这是否会影响到使用流读取器,而不是将数据加载到内存中块的能力。

I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block.

利用这个机会,我可以看到的一种方法是使用一个基于流的方法,也许包裹XPath语句到,我可以在整个文件运行后XSLT转换进行简单的分析,尽管这似乎有点令人费解。

One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted.

另外,我知道,有一些元素的XPath查询不会遇到过,所以我想我可能会向上突破的文档转换为一系列基于它更小的片段的原树结构,这也许可以小到足以过程在内存中,而不会造成太大的破坏。

Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc.

我试着在这里解释一下我的目标,所以如果我吠叫起来完全错了的一般方法方面,我敢肯定,你的乡亲可以设置我的权利......

I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

推荐答案

XPathReader就是答案。它不是C#的运行时的一部分,但它是可从微软下载。下面是一个 MSDN文章

XPathReader is the answer. It isn't part of the C# runtime, but it is available for download from Microsoft. Here is an MSDN article.

如果您构建一个XPathReader有一个XmlTextReader你会得到一个流读取与XPath的前pressions方便的效率。

If you construct an XPathReader with an XmlTextReader you get the efficiency of a streaming read with the convenience of XPath expressions.

我没有用它的千兆字节大小的文件,但我已经用它的是几十兆的文件,这是通常足以减缓基于DOM的解决方案。

I haven't used it on gigabyte sized files, but I have used it on files that are tens of megabytes, which is usually enough to slow down DOM based solutions.

从下面的引述道:XPathReader提供以流方式执行的XPath在XML文档的能力。

Quoting from the below: "The XPathReader provides the ability to perform XPath over XML documents in a streaming manner".

<一个href="http://www.microsoft.com/downloads/details.aspx?FamilyID=DB0C5FAE-111D-4B24-B10C-E4CDB13705DA&displaylang=en"相对=nofollow>从Microsoft下载

这篇关于如何最好地使用XPath在.NET中使用非常大的XML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆