将大量数据写入不同的源. [英] Writing Large Amount of Data to Different Sources.

查看:56
本文介绍了将大量数据写入不同的源.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我有一个很大的二进制文件(> 24GB).我在流中逐行阅读它.每行都是一个消息".

我有c. 500个文本文件,每个文件都有唯一的路径.我希望将每个消息分配到文本文件,条件是消息具有什么"ID".

我想写ASCII文本(不是二进制).

我目前有这个工作.问题是它非常缓慢.我认为这是因为,每一行我都执行以下操作:



I have a large binary file (>24GB). I read it in in a stream, line by line. Each line is a "message".

I have c. 500 text files, each with a unique path. I wish to allocate each message to a text file conditional on what "ID" the message has.

I wish to write ASCII text (not binary).

I currently have this working. The problem is its very slow. I think this is because, each line I do the following:



public static TextWriter datawriterB2T;   //Done only once at the class level.
public static void ProcessMessage(Message message, string ID) //called for each message
{
    string activeFile = @"U:\processedText\" + ID + "\\" + "_myData.txt";

    datawriterB2T = new StreamWriter(activeFile, true);

    datawriterB2T.WriteLine(message.ToString());

    // Flush all the data from RAM and close the text file
    datawriterB2T.Flush();
    datawriterB2T.Dispose();
    datawriterB2T.Close();
}




因此,我要打开和关闭I/O的次数很多.太慢了

我会更好地打开大量IO并使其保持打开状态,并利用内置缓冲让C#决定何时刷新吗?
C#可以保持打开此数量的流正常吗?

我在想类似的东西:




Hence I am opening and closing the I/O a very large number of times. This is slow.

Would I be better off opening a large number of IO and keeping them open and making use of the inbuilt buffering to let C# deciede when to flush?
Can C# keep open this number of streams OK?

I was thinking something like:

using(streamwriter sw_ID = new steamwriter(myPath_ID)



为500条路径中的每条路径,然后将



for each of the 500 paths and then passing the

sw_ID

对象传递给我的写函数.

object to my write function.

public static void writeMessage(sw_ID)
{
sw_ID.writeLine(message);
}





Would this work?

推荐答案

我的第一个回答是尝试一下".

如果不起作用,请尝试打开源文件,并一次将其解析100行到各种字符串列表缓冲区中(根据需要),然后在缓冲区填满"时将其刷新到适当的文件中.这样,您可以将内存使用控制在合理的水平,并在磁盘IO的控制之下.
My first response is "try it and see".

If it doesn''t work try opening the source file, and parse it 100 lines at a time into various string list buffers (as they''re needed), and when a buffer "fills up" flush it to the appropriate file. This way, you keep memory use to a reasonable level and disk IO under control.


这篇关于将大量数据写入不同的源.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆