将大量数据写入不同的源. [英] Writing Large Amount of Data to Different Sources.
问题描述
我有一个很大的二进制文件(> 24GB).我在流中逐行阅读它.每行都是一个消息".
我有c. 500个文本文件,每个文件都有唯一的路径.我希望将每个消息分配到文本文件,条件是消息具有什么"ID".
我想写ASCII文本(不是二进制).
我目前有这个工作.问题是它非常缓慢.我认为这是因为,每一行我都执行以下操作:
I have a large binary file (>24GB). I read it in in a stream, line by line. Each line is a "message".
I have c. 500 text files, each with a unique path. I wish to allocate each message to a text file conditional on what "ID" the message has.
I wish to write ASCII text (not binary).
I currently have this working. The problem is its very slow. I think this is because, each line I do the following:
public static TextWriter datawriterB2T; //Done only once at the class level.
public static void ProcessMessage(Message message, string ID) //called for each message
{
string activeFile = @"U:\processedText\" + ID + "\\" + "_myData.txt";
datawriterB2T = new StreamWriter(activeFile, true);
datawriterB2T.WriteLine(message.ToString());
// Flush all the data from RAM and close the text file
datawriterB2T.Flush();
datawriterB2T.Dispose();
datawriterB2T.Close();
}
因此,我要打开和关闭I/O的次数很多.太慢了
我会更好地打开大量IO并使其保持打开状态,并利用内置缓冲让C#决定何时刷新吗?
C#可以保持打开此数量的流正常吗?
我在想类似的东西:
Hence I am opening and closing the I/O a very large number of times. This is slow.
Would I be better off opening a large number of IO and keeping them open and making use of the inbuilt buffering to let C# deciede when to flush?
Can C# keep open this number of streams OK?
I was thinking something like:
using(streamwriter sw_ID = new steamwriter(myPath_ID)
为500条路径中的每条路径,然后将
for each of the 500 paths and then passing the
sw_ID
对象传递给我的写函数.
object to my write function.
public static void writeMessage(sw_ID)
{
sw_ID.writeLine(message);
}
Would this work?
推荐答案
我的第一个回答是尝试一下".
如果不起作用,请尝试打开源文件,并一次将其解析100行到各种字符串列表缓冲区中(根据需要),然后在缓冲区填满"时将其刷新到适当的文件中.这样,您可以将内存使用控制在合理的水平,并在磁盘IO的控制之下.
My first response is "try it and see".
If it doesn''t work try opening the source file, and parse it 100 lines at a time into various string list buffers (as they''re needed), and when a buffer "fills up" flush it to the appropriate file. This way, you keep memory use to a reasonable level and disk IO under control.
这篇关于将大量数据写入不同的源.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!