将非常大的文件从xml转换为csv [英] Converting very large files from xml to csv
问题描述
目前,我使用以下代码片段将 .txt
文件转换为 XML
数据c $ c> .CSV 格式。我的问题是这样,目前这是完美的文件,大约100-200 mbs,转换时间非常低(最多1-2分钟),但我现在需要这个工作更大的文件(1-2 GB的每个文件)。当前程序冻结计算机,转换大约需要30-40分钟与此功能。不知道我将如何继续改变这个功能。任何帮助将不胜感激。
Currently I'm using the following code snippet to convert a .txt
file with XML
data to .CSV
format. My question is this, currently this works perfectly with files that are around 100-200 mbs and the conversion time is very low (1-2 minutes max), However I now need this to work for much bigger files (1-2 GB's each file). Currently the program freezes the computer and the conversion takes about 30-40 minutes with this function. Not sure how I would proceed changing this function. Any help will be appreciated!
string all_lines = File.ReadAllText(p);
all_lines = "<Root>" + all_lines + "</Root>";
XmlDocument doc_all = new XmlDocument();
doc_all.LoadXml(all_lines);
StreamWriter write_all = new StreamWriter(FILENAME1);
XmlNodeList rows_all = doc_all.GetElementsByTagName("XML");
foreach (XmlNode rowtemp in rows_all)
{
List<string> children_all = new List<string>();
foreach (XmlNode childtemp in rowtemp.ChildNodes)
{
children_all.Add(Regex.Replace(childtemp.InnerText, "\\s+", " "));
}
write_all.WriteLine(string.Join(",", children_all.ToArray()));
}
write_all.Flush();
write_all.Close();
示例输入::
<XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>
last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>
<XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>
last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>
示例输出::
1,4,7,,5,hello,there,my,name,is,jack,,last,name,missing,above,3,6,7,,8,4
1,5,7,,3,hello,there,my,name,is,mary,jane,last,name,not,missing,above,3,6,7,,8,4
推荐答案
您需要采取串流方式,因为您目前正在阅读整个2Gb文件到内存然后处理它。
You need to take a streaming approach, as you're currently reading the entire 2Gb file into memory and then processing it. You should read a bit of XML, write a bit of CSV and keep doing that until you've processed it all.
下面是一个可能的解决方案:
A possible solution is below:
using (var writer = new StreamWriter(FILENAME1))
{
foreach (var element in StreamElements(r, "XML"))
{
var values = element.DescendantNodes()
.OfType<XText>()
.Select(e => Regex.Replace(e.Value, "\\s+", " "));
var line = string.Join(",", values);
writer.WriteLine(line);
}
}
其中 StreamElements
的灵感来自Jon Skeet在 XmlReader 的 XElement
http://stackoverflow.com/questions/2441673/reading-xml-with-xmlreader-in-c-sharp\">这个问题。我已经做了一些更改以支持您的无效XML(因为您没有根元素):
Where StreamElements
is inspired by Jon Skeet's streaming of XElement
s from an XmlReader
in an answer to this question. I've made some changes to support your 'invalid' XML (as you have no root element):
private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
{
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (XmlReader reader = XmlReader.Create(fileName, settings))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == elementName)
{
var el = XNode.ReadFrom(reader) as XElement;
if (el != null)
{
yield return el;
}
}
}
}
}
}
这篇关于将非常大的文件从xml转换为csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!