获取非常大的文本文件的最后 10 行 >10GB [英] Get last 10 lines of very large text file > 10GB

查看:24
本文介绍了获取非常大的文本文件的最后 10 行 >10GB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显示一个非常大的文本文件(这个特定文件超过 10GB)的最后 10 行的最有效方法是什么?我想只写一个简单的 C# 应用程序,但我不知道如何有效地做到这一点.

What is the most efficient way to display the last 10 lines of a very large text file (this particular file is over 10GB). I was thinking of just writing a simple C# app but I'm not sure how to do this effectively.

推荐答案

读到文件末尾,然后向后查找直到找到十个换行符,然后考虑到各种编码,向前读到最后.一定要处理文件中的行数少于十的情况.下面是一个实现(在 C# 中,因为你标记了这个),概括为在位于 path 的文件中找到最后一个 numberOfTokensencoding 编码,其中令牌分隔符由 tokenSeparator 表示;结果作为 string 返回(这可以通过返回枚举标记的 IEnumerable 来改进).

Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokens in the file located at path encoded in encoding where the token separator is represented by tokenSeparator; the result is returned as a string (this could be improved by returning an IEnumerable<string> that enumerates the tokens).

public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) {

    int sizeOfChar = encoding.GetByteCount("
");
    byte[] buffer = encoding.GetBytes(tokenSeparator);


    using (FileStream fs = new FileStream(path, FileMode.Open)) {
        Int64 tokenCount = 0;
        Int64 endPosition = fs.Length / sizeOfChar;

        for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
            fs.Seek(-position, SeekOrigin.End);
            fs.Read(buffer, 0, buffer.Length);

            if (encoding.GetString(buffer) == tokenSeparator) {
                tokenCount++;
                if (tokenCount == numberOfTokens) {
                    byte[] returnBuffer = new byte[fs.Length - fs.Position];
                    fs.Read(returnBuffer, 0, returnBuffer.Length);
                    return encoding.GetString(returnBuffer);
                }
            }
        }

        // handle case where number of tokens in file is less than numberOfTokens
        fs.Seek(0, SeekOrigin.Begin);
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, buffer.Length);
        return encoding.GetString(buffer);
    }
}

这篇关于获取非常大的文本文件的最后 10 行 &gt;10GB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆