是StreamReader.Readline()真的来计算文件中行最快的方法是什么? [英] Is StreamReader.Readline() really the fastest method to count lines in a file?

查看:604
本文介绍了是StreamReader.Readline()真的来计算文件中行最快的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在东张西望了一会儿,我发现了如何计算出的行数在一个文件中相当多的讨论



例如这三个:

C#我怎么算的线一个文本结果
确定行的一个文本文件结果
如何算行快?



所以,我说干就干,结束了使用什么似乎是最有效的(至少内存明智?)的方法,我可以发现:使用(StreamReader的R =新的StreamReader

 私有静态诠释countFileLines(字符串文件路径)
{
(文件路径))
{
INT I = 0;
,而(r.ReadLine()!= NULL)
{
I ++;
}
回报我;
}
}



但是,这需要永远当从文件中的行本身很长。有实在不是一个更快的解决这个?



我一直在尝试使用 StreamReader.Read() StreamReader.Peek(),但我不能(或不知道如何),使它们中的尽快移动到下一行有的东西(字符?文字?)。



任何想法吗?






结论/结果(运行的基础上提供的答案一些测试后):



我测试下面的5种方法在两个不同的文件而我得到的似乎表明,普通的旧 StreamReader.ReadLine()仍是最快的方式之一......说实话,我毕竟很困惑一致的结果,评论和讨论的答案



文件#1:结果
尺寸:3631 KB结果
线:56870



结果以秒为单位的文件#1:结果
0.02 - > ReadLine方法。结果
0.04 - >读法。结果
0.29 - > ReadByte方法。结果
0.25 - > Readlines.Count方法。结果
0.04 - > ReadWithBufferSize方法。结果



文件#2:结果
大小:14499 KB结果
线:213424



在几秒钟结果文件#1:结果
0.08 - > ReadLine方法。结果
0.19 - >读法。结果
1.15 - > ReadByte方法。结果
1.02 - > Readlines.Count方法。结果
0.08 - > ReadWithBufferSize方法。结果



下面是5个方法我测试基于所有我收到的反馈意见:

 私有静态诠释countWithReadLine(字符串文件路径)
{
使用(StreamReader的R =新的StreamReader(文件路径))
{
INT I = 0;
,而(r.ReadLine()!= NULL)
{
I ++;
}
回报我;
}
}

私有静态诠释countWithRead(字符串文件路径)
{$ B $使用B(StreamReader的_reader =新的StreamReader(文件路径))
{
INT C = 0,计数= 0;
,而(!(C = _reader.Read())= -1)
{
如果(C == 10)
{
计数++;
}
}
返回计数;
}
}

私有静态诠释countWithReadByte(字符串文件路径)
{使用
(流S =新的FileStream(文件路径,FileMode.Open))
{
INT I = 0;
INT B:

B = s.ReadByte();
而(B个= 0)
{
如果(B == 10)
{
I ++;
}
B = s.ReadByte();
}
回报我;
}
}

私有静态诠释countWithReadLinesCount(字符串文件路径)
{
返回File.ReadLines(文件路径).Count之间的();
}

私有静态诠释countWithReadAndBufferSize(字符串文件路径)
{
INT缓冲区大小= 512;使用

(流S =新的FileStream(文件路径,FileMode.Open))
{
INT I = 0;
的byte [] B =新的字节[缓冲区大小]
INT N = 0;

N = s.Read(B,0,缓冲区大小);
而(N 0)
{
I + = countByteLines(B,N);
N = s.Read(B,0,缓冲区大小);
}
回报我;
}
}

私有静态诠释countByteLines(字节[] B,INT N)
{
INT I = 0;
为(INT J = 0; J< N; J ++)
{
如果(B [J] == 10)
{
I ++;
}
}

回报我;
}


解决方案

没有,它不是。点是 - 它物化的琴弦,这是没有必要



要算什么,你要好得多忽略了串部分,并走线零件



一个线与\r\\\
结束字节seriees(13,10 - CR LF)或其他标记



沿字节只要运行,在缓冲的流,计数的行标记的端部出现的次数。


While looking around for a while I found quite a few discussions on how to figure out the number of lines in a file.

For example these three:
c# how do I count lines in a textfile
Determine the number of lines within a text file
How to count lines fast?

So, I went ahead and ended up using what seems to be the most efficient (at least memory-wise?) method that I could find:

private static int countFileLines(string filePath)
{
    using (StreamReader r = new StreamReader(filePath))
    {
        int i = 0;
        while (r.ReadLine() != null) 
        { 
            i++; 
        }
        return i;
    }
}

But this takes forever when the lines themselves from the file are very long. Is there really not a faster solution to this?

I've been trying to use StreamReader.Read() or StreamReader.Peek() but I can't (or don't know how to) make the either of them move on to the next line as soon as there's 'stuff' (chars? text?).

Any ideas please?


CONCLUSION/RESULTS (After running some tests based on the answers provided):

I tested the 5 methods below on two different files and I got consistent results that seem to indicate that plain old StreamReader.ReadLine() is still one of the fastest ways... To be honest, I'm perplexed after all the comments and discussion in the answers.

File #1:
Size: 3,631 KB
Lines: 56,870

Results in seconds for File #1:
0.02 --> ReadLine method.
0.04 --> Read method.
0.29 --> ReadByte method.
0.25 --> Readlines.Count method.
0.04 --> ReadWithBufferSize method.

File #2:
Size: 14,499 KB
Lines: 213,424

Results in seconds for File #1:
0.08 --> ReadLine method.
0.19 --> Read method.
1.15 --> ReadByte method.
1.02 --> Readlines.Count method.
0.08 --> ReadWithBufferSize method.

Here are the 5 methods I tested based on all the feedback I received:

private static int countWithReadLine(string filePath)
{
    using (StreamReader r = new StreamReader(filePath))
    {
    int i = 0;
    while (r.ReadLine() != null)
    {
        i++;
    }
    return i;
    }
}

private static int countWithRead(string filePath)
{
    using (StreamReader _reader = new StreamReader(filePath))
    {
    int c = 0, count = 0;
    while ((c = _reader.Read()) != -1)
    {
        if (c == 10)
        {
        count++;
        }
    }
    return count;
    }            
}

private static int countWithReadByte(string filePath)
{
    using (Stream s = new FileStream(filePath, FileMode.Open))
    {
    int i = 0;
    int b;

    b = s.ReadByte();
    while (b >= 0)
    {
        if (b == 10)
        {
        i++;
        }
        b = s.ReadByte();
    }
    return i;
    }
}

private static int countWithReadLinesCount(string filePath)
{
    return File.ReadLines(filePath).Count();
}

private static int countWithReadAndBufferSize(string filePath)
{
    int bufferSize = 512;

    using (Stream s = new FileStream(filePath, FileMode.Open))
    {
    int i = 0;
    byte[] b = new byte[bufferSize];
    int n = 0;

    n = s.Read(b, 0, bufferSize);
    while (n > 0)
    {
        i += countByteLines(b, n);
        n = s.Read(b, 0, bufferSize);
    }
    return i;
    }
}

private static int countByteLines(byte[] b, int n)
{
    int i = 0;
    for (int j = 0; j < n; j++)
    {
    if (b[j] == 10)
    {
        i++;
    }
    }

    return i;
}

解决方案

No, it is not. Point is - it materializes the strings, which is not needed.

To COUNT it you are much better off to ignore the "string" Part and to go the "line" Part.

a LINE is a seriees of bytes ending with \r\n (13, 10 - CR LF) or another marker.

Just run along the bytes, in a buffered stream, counting the number of appearances of your end of line marker.

这篇关于是StreamReader.Readline()真的来计算文件中行最快的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆