阅读巨大的XML文件 [英] Read Huge XML files

查看:88
本文介绍了阅读巨大的XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序有两个项目

My application has two projects

Project_1 - >通过从许多文件中读取数据来创建XML文件(一次创建)

Project_1 -> Creates an XML file by reading data from many files (one time creation)

Project_2 - >分析在各种测试条件下普遍创建的XML文件。

Project_2 -> Analyses the XML file created previosly under various test conditions.

 的第一步Project_2是从XML文件创建数据表。这将作为整个项目的参考

The first step in  Project_2 is to create a data table from the XML file. This will
serve as a reference for the entire project

我使用    datatable.ReadXml(XML文件路径)

I Used    datatable.ReadXml(XML file path)

这里的问题是XML文件的大小非常大(接近250 MB)

The problem here is that the size of the XML file is very huge (nearly 250 MB)


所以这一步  datatable.ReadXml(XML文件路径)需要花费大量时间才能执行(3分钟到5分钟)


So the step  datatable.ReadXml(XML file path) is taking a lot of time to execute (3mins to 5 mins)

我需要以某种方式将此时间缩短约50%

I need to somehow reduce this time by about 50 %


我尝试将文件拆分为5个项目1中50 MB的小文件,并在project_2中的5个不同表中的5个多线程上读取它们。


I tried splitting the files into 5 small files of 50 MB in project 1, and reading them on 5 multiple threads in 5 different tables in project_2.

它工作得很好,阅读时间只占原始时间的1/4。 但是这5个线程创建了5个不同的数据表,我不得不将所有这些数据表合并到一个数据表中。但是这个数据表.Merge(dtChaine_1)到了  datatable.Merge(dtChaine_5)花了这么多时间,以便前一次保存时间无效

It workked fine and the reading took only 1/4th of the origional time.  But these 5 threads
created 5 different data tables and I had to merge all these to one data table. But this
datatable.Merge(dtChaine_1) to   datatable.Merge(dtChaine_5) took so much time so that the previous time saving is nullified


任何人都可以建议哪个是读取这个250MB XML的最佳方式文件延迟最短。有什么替代品吗? datatable.ReadXml()?


Can anyone plz suggest which is the best way to read this 250MB XML file with minimum latency. Is there any alternatives for  datatable.ReadXml()?

推荐答案

正如John所说:为什么你需要DataTable吗?你真的需要250MB的数据一直加载到内存中吗? DataTable是存储它的最佳格式吗?拥有自己的存储不是更好,在这种情况下你可以决定权衡(存储的复杂性,加载时间,内存消耗......)。您是否考虑过XML DOM,尤其是XLinq?是只读还是读写(我猜是只读)?

As John suggested: Why do you need DataTable for this? Do you really need 250MB worth of data loaded into memory all the time? Is DataTable the best format to store it in? Wouldn't it be better to have your own storage in which case you get to dictate the tradeoffs (complexity of storage, load time, memory consumption, ...). Have you considered XML DOMs, especially XLinq? Is it read-only or read-write (I guess read-only)?

出于好奇,我们谈论了多长时间?是几秒钟,几分钟......?

Out of curiosity, how much time are we talking about? Is it several seconds, minutes, ...?

谢谢,


这篇关于阅读巨大的XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆