提取数据从XMLS一小部分 [英] Extracting a small subset of data from XMLs

查看:190
本文介绍了提取数据从XMLS一小部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写一个C#/ VB程序,它是用于根据在XMLS接收到的信息报告数据。

I am writing a C# / VB program that is to be used for reporting data based upon information received in XMLs.

我的情况是,我每月领取很多XMLS(约100-200) - 每个都是从10MB到350MB。对于每一个这些XMLS的,我只需要其数据的一小部分(的任何一个文件的全部数据小于5%),以产生必要的报告。

My situation is that I receive many XMLs per month (about 100-200) - Each ranging in size from 10mb to 350mb. For each of these XMLs, I only need a small subset of its data (less than 5% of any one file's entire data) so as to produce the necessary reports.

另外,数据的子集将总是在同一个键结构保持(它将内的多个键和按不同级别下存在,也许,但它将始终在同一键名存在/键含它将总是具有与诸如姓名等)

Also, that subset of data will always be held in the same key-structure (it will exist within multiple keys and at differing levels down, perhaps, but it will always exist within the same key names / the keys containing it will always have the with the same attributes such as "name", etc)

所以,我目前如何去这样做的想法是:

So, my current idea of how to go about doing this is to:

  1. 要创建一个刮刀,将使用XPath其拉离XMLS必要的数据。
  2. 存储必要的数据在一个SQL Server表一起存储在一个单独的表,从而知道哪些文件该刮数据文件的特征数据小部分来自
  3. 查询出的数据到一个程序报告它。

我在这里的主要问题是真的什么是凑这些数据,最好的方法是什么? 我最熟悉的XPath,但200MB的大小的文件,我怕加载整个文件中的性能问题。

My main question here is really what is the best way to scrape that data out? I am most familiar with XPath, but for multiple files of 200MB in size, I'm afraid of performance issues loading in the entire file.

我看到其他的事情/研究有:

Other things I have seen / researched are:

  1. 在创建XSLT文件转换/拉从XML只是我想要的数据
  2. 使用LINQ to XML
  3. 不知何故贯通XMLS到SQL服务器,然后能够直接对它们进行查询
  4. 使用ADO在程序中查询从XMLS
  5. 使用XMLReader类做这件事(而不是在加载了每个XML完全)
  6. 也许有,这是否很好地已经是一个原生.NET组件

坦率地说,我只是不知道什么该标准给出了大量的XMLS和文件大小大的变化,我不熟悉任何这样做的其他方式 - 比如,例如,贯通XMLS到SQL Server的直接/使用ADO查询XML - ,因此,不知道自己可能的好处/缺点

Quite honestly, I just have no clue what the standard is given the high number of XMLs and the large variance in file sizes and I'm not familiar with any of the other ways of doing this - such as, for example, linking the XMLs to SQL Server directly / using ADO to query the XML - and, therefore, don't know of their possible benefits / drawbacks.

如果你们已经在类似的情况,我的真的 AP preciate任何指针的方向是正确的/起码验证了我的方法是不是最糟糕的一出来有:)

If any of you have been in a similar situation, I'd really appreciate any kind of pointers in the right direction / at least validation that my method isn't the worst one out there :)

谢谢!

推荐答案

至于内存消耗和性能问题,在.NET的XML API的一个很好的功能是可以用的XPathDocument或XmlDocument的或的XElement结合的XmlReader只选择性地阅读文档的一部分到内存中,然后有XPath或LINQ到可用的那一部分XML特性。的LINQ to XML有<一个href="http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx">http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx对于这样做,DOM / XmlDocument的有<一个href="http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx">http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx.所以,这取决于你的XML结构,你或许可以不消耗太多的内存在一个快速的方法使用一个XmlReader来读取着通过XML,然后,当你有你感兴趣的元素,你可以将它读入一个的XElement (LINQ到XML)或者的XmlNode (DOM),然后向的LINQ to XML和/或XPath读出的细节。

As for the memory consumption and performance concerns, a nice feature of the .NET XML APIs is that you can combine XmlReader with XPathDocument or XmlDocument or XElement to only selectively read part of a document into memory to then have the XPath or LINQ to XML features available on that part. LINQ to XML has http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx for doing that, DOM/XmlDocument has http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx. So depending on your XML structure you might be able to use an XmlReader to read forward through the XML in a fast way without consuming much memory and then, when you have the element you are interested in, you can read it into an XElement (LINQ to XML) or XmlNode (DOM) to then apply LINQ to XML and/or XPath to read out details.

这篇关于提取数据从XMLS一小部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆