尽快打开并读取数千个文件 [英] Open and read thousands of files as fast as possible

查看:208
本文介绍了尽快打开并读取数千个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要尽快打开并阅读数千个文件。

I need to open and read thousands of files as fast as possible.

我在13 592个文件上运行了一些测试,发现方法1稍快一些方法2.这些文件通常介于800字节和4kB之间。我想知道是否有什么办法可以让这个I / O绑定过程更快?

I have ran a few tests on 13 592 files and found Method 1 to be slightly faster than Method 2. These files are usually between 800 bytes and 4kB. I would like to know if there is anything I can do to make this I/O-bound process faster?

Method 1:
    Run 1: 3:05 (don't know what happened here)
    Run 2: 1:55
    Run 3: 2:06
    Run 4: 2:02
Method 2:
    Run 1: 2:04
    Run 2: 2:08
    Run 3: 2:04
    Run 4: 2:12

以下是代码:

public class FileOpenerUtil
{

    /// <summary>
    /// 
    /// </summary>
    /// <param name="fullFilePath"></param>
    /// <returns></returns>
    public static string ReadFileToString(string fullFilePath)
    {
        while (true)
        {
            try
            {
                //Methode 1
                using (StreamReader sr = File.OpenText(fullFilePath))
                {
                    string fullMessage = "";
                    string s;
                    while ((s = sr.ReadLine()) != null)
                    {
                        fullMessage += s + "\n";
                    }
                    return RemoveCarriageReturn(fullMessage);
                }
                //Methode 2
                /*using (File.Open(fullFilePath, FileMode.Open, FileAccess.Read, FileShare.Read))
                {
                    Console.WriteLine("Output file {0} ready.", fullFilePath);
                    string[] lines = File.ReadAllLines(fullFilePath);
                    //Every new line under the previous line
                    string fullMessage = lines.Aggregate("", (current, s) => current + s + "\n");
                    return RemoveCarriageReturn(fullMessage);
                    //ninject kernel


                }*/
                //Methode 3

            }
            catch (FileNotFoundException ex)
            {
                Console.WriteLine("Output file {0} not yet ready ({1})", fullFilePath, ex.Message);
            }
            catch (IOException ex)
            {
                Console.WriteLine("Output file {0} not yet ready ({1})", fullFilePath, ex.Message);
            }
            catch (UnauthorizedAccessException ex)
            {
                Console.WriteLine("Output file {0} not yet ready ({1})", fullFilePath, ex.Message);
            }
        }

    }

    /// <summary>
    /// Verwijdert '\r' in een string sequence
    /// </summary>
    /// <param name="message">The text that has to be changed</param>
    /// <returns>The changed text</returns>
    private static string RemoveCarriageReturn(string message)
    {
        return message.Replace("\r", "");
    }
}

我正在阅读的文件是.HL7文件和看起来像这样:

The files I'm reading are .HL7 files and look like this:


MSH | ^〜\& | OAZIS |||| 20150430235954 || ADT ^ A03 | 23669166 | P | 2.3 |||||| ASCII
EVN | A03 | 20150430235954 |||| 201504302359
PID | 1 || 6001144000 || LastName ^ FirstName ^^^ Mevr。| LastName ^ FirstName | 19600114 | F ||| GStreetName Number ^^ City ^^ PostalCode ^ B ^ H || 09/3444556 ^^ PH~0476519246echtg ^^ CP || NL | M || 28783409 ^^^^ VN | 0000000000 | 60011402843 |||| || B |||| N
PD1 |||| 003847 ^ LastName ^ FirstName |||||||| N ||| 0
PV1 | 1 | O | FDAG ^ 000 ^ 053 ^ 001 ^ 0 ^ 2 | NULL || FDAG ^ 000 ^ 053 ^ 001 | 003847 ^名字^姓|| 006813 ^名字^姓| 1900 | |||||| 00 ^ 006813 ^名字姓| 0 | 28783409 ^ ^^^ VN | 1 ^ 20150430 | 01 ||||||||||||||| 1 | 1 || D ||||| 201504301336 | 201504302359
OBX | 1 | CE | KIND_OF_DIS | RCM | 1 ^ 1 Op medisch建议
OBX | 2 | CE | DESTINATION_DIS | RCM | 1 ^ 1 Terug naar huis

MSH|^~\&|OAZIS||||20150430235954||ADT^A03|23669166|P|2.3||||||ASCII EVN|A03|20150430235954||||201504302359 PID|1||6001144000||LastName^FirstName^^^Mevr.|LastName^FirstName|19600114|F|||GStreetName Number^^City^^PostalCode^B^H||09/3444556^^PH~0476519246echtg^^CP||NL|M||28783409^^^^VN|0000000000|60011402843||||||B||||N PD1||||003847^LastName^FirstName||||||||N|||0 PV1|1|O|FDAG^000^053^001^0^2|NULL||FDAG^000^053^001|003847^LastName^FirstName||006813^LastName^FirstName|1900|00||||||006813^LastName^FirstName|0|28783409^^^^VN|1^20150430|01|||||||||||||||1|1||D|||||201504301336|201504302359 OBX|1|CE|KIND_OF_DIS|RCM|1^1 Op medisch advies OBX|2|CE|DESTINATION_DIS|RCM|1^1 Terug naar huis

打开文件后,我用 j4jayant的HL7解析器并关闭文件。

Once I opened the file, I parse the string with j4jayant's HL7 parser and close the file.

推荐答案

我使用了50,000个不同大小的文件(500到1024字节)。

I used 50,000 files of varying size (500 to 1024 bytes).

测试1 :您的方法1 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();

秒:3,4658937968113

测试2 :你的方法2 File.ReadAllLines(fullFilePath)

秒:5,5008349279222

测试3 文件.ReadAllText(fullFilePath);

秒:3,30782645637133

测试4 BinaryReader b =新的BinaryReader; b.ReadString();

秒:5,85779941381009

测试5 Windows FileReader https://msdn.microsoft.com/en -us / library / 2d9wy99d.aspx

秒:3,07036554759848

测试6 StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();

秒:3,31464109255517

测试7 StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();

秒:3,3364683664508

测试8 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();

秒:3,40426888695317

测试9 :FileStream + BufferedStream + StreamReader

秒:4,02871911079061

测试10 Parallel.For使用代码File.ReadAllText(fullFilePath);

秒:0,89543632235447

Test 1: Your method 1 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
Seconds: 3,4658937968113
Test 2: Your method 2 File.ReadAllLines(fullFilePath)
Seconds: 5,5008349279222
Test 3: File.ReadAllText(fullFilePath);
Seconds: 3,30782645637133
Test 4: BinaryReader b = new BinaryReader; b.ReadString();
Seconds: 5,85779941381009
Test 5: Windows FileReader (https://msdn.microsoft.com/en-us/library/2d9wy99d.aspx)
Seconds: 3,07036554759848
Test 6: StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
Seconds: 3,31464109255517
Test 7: StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
Seconds: 3,3364683664508
Test 8: StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
Seconds: 3,40426888695317
Test 9: FileStream + BufferedStream + StreamReader
Seconds: 4,02871911079061
Test 10: Parallel.For using code File.ReadAllText(fullFilePath);
Seconds: 0,89543632235447

最佳测试结果测试5 测试3 (单线程)

测试3 正在使用: File.ReadAllText(fullFilePath);

测试5 使用 Windows FileReader https://msdn.microsoft.com/en-us/library/2d9wy99d.aspx

Best test results are Test 5 and Test 3 (single thread)
Test 3 is using: File.ReadAllText(fullFilePath);
Test 5 uses Windows FileReader (https://msdn.microsoft.com/en-us/library/2d9wy99d.aspx)

如果您可以使用线程测试10 是最快的。

If you can use threads Test 10 is by far the quickest.

示例:

int maxFiles = 50000;
int j = 0;
Parallel.For(0, maxFiles, x =>
{
    Util.Method1("readtext_" + j + ".txt"); // your read method
    j++;
});



使用RAMMap清空待机列表时:


When using RAMMap to empty the standby list:

测试1 :您的方法1 StreamReader sr = File.OpenText(fullFilePath);
sr.ReadLine();


秒:15,1785750622961

测试2 :您的方法2 File.ReadAllLines(fullFilePath)

秒:17,650864469466

测试3 File.ReadAllText(fullFilePath);

秒:14,8985912878328

测试4 BinaryReader b = new BinaryReader; b.ReadString();

秒:18,1603815767866

测试5 Windows FileReader

秒:14,5059765845334

测试6 StreamReader sr = File.OpenText(fullFilePath ); sr.ReadToEnd();

秒:14,8649786336991

测试7 StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();

秒:14,830567197641

测试8 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();

秒:14,9965866575751

测试9 :FileStream + BufferedStream + StreamReader

秒:15,7336450516575

测试10 Parallel.For()使用代码File.ReadAllText(fullFilePath);

秒:4,11343060325439

Test 1: Your method 1 StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
Seconds: 15,1785750622961
Test 2: Your method 2 File.ReadAllLines(fullFilePath)
Seconds: 17,650864469466
Test 3: File.ReadAllText(fullFilePath);
Seconds: 14,8985912878328
Test 4: BinaryReader b = new BinaryReader; b.ReadString();
Seconds: 18,1603815767866
Test 5: Windows FileReader
Seconds: 14,5059765845334
Test 6: StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
Seconds: 14,8649786336991
Test 7: StreamReader sr = File.OpenText(fullFilePath); sr.ReadToEnd();
Seconds: 14,830567197641
Test 8: StreamReader sr = File.OpenText(fullFilePath); sr.ReadLine();
Seconds: 14,9965866575751
Test 9: FileStream + BufferedStream + StreamReader
Seconds: 15,7336450516575
Test 10: Parallel.For() using code File.ReadAllText(fullFilePath);
Seconds: 4,11343060325439

这篇关于尽快打开并读取数千个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆