获取非常大的文本文件的最后 10 行 >10GB [英] Get last 10 lines of very large text file > 10GB
问题描述
显示一个非常大的文本文件(这个特定文件超过 10GB)的最后 10 行的最有效方法是什么?我想只写一个简单的 C# 应用程序,但我不知道如何有效地做到这一点.
What is the most efficient way to display the last 10 lines of a very large text file (this particular file is over 10GB). I was thinking of just writing a simple C# app but I'm not sure how to do this effectively.
推荐答案
读到文件末尾,然后向后查找直到找到十个换行符,然后考虑到各种编码,向前读到最后.一定要处理文件中的行数少于十的情况.下面是一个实现(在 C# 中,因为你标记了这个),概括为在位于 path
的文件中找到最后一个 numberOfTokens
以 encoding
编码,其中令牌分隔符由 tokenSeparator
表示;结果作为 string
返回(这可以通过返回枚举标记的 IEnumerable
来改进).
Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokens
in the file located at path
encoded in encoding
where the token separator is represented by tokenSeparator
; the result is returned as a string
(this could be improved by returning an IEnumerable<string>
that enumerates the tokens).
public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) {
int sizeOfChar = encoding.GetByteCount("
");
byte[] buffer = encoding.GetBytes(tokenSeparator);
using (FileStream fs = new FileStream(path, FileMode.Open)) {
Int64 tokenCount = 0;
Int64 endPosition = fs.Length / sizeOfChar;
for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
fs.Seek(-position, SeekOrigin.End);
fs.Read(buffer, 0, buffer.Length);
if (encoding.GetString(buffer) == tokenSeparator) {
tokenCount++;
if (tokenCount == numberOfTokens) {
byte[] returnBuffer = new byte[fs.Length - fs.Position];
fs.Read(returnBuffer, 0, returnBuffer.Length);
return encoding.GetString(returnBuffer);
}
}
}
// handle case where number of tokens in file is less than numberOfTokens
fs.Seek(0, SeekOrigin.Begin);
buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
return encoding.GetString(buffer);
}
}
这篇关于获取非常大的文本文件的最后 10 行 >10GB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!