如何在C#中读取和处理大日志文件 [英] How Do I Read And Process Big Log File In C#
问题描述
我有一个服务器日志文件,它有近50万行,我想读取这个文件,并从每行提取IP地址
然后排名哪个IP地址大多是loged
i have a server log file it has almost half million lines, and i want to read this file and extract ip addresses from each line
then rank which ip address is mostly loged
推荐答案
最简单的方法可能是使用词典:
Easiest way is probably to use a Dictionary:
Dictionary<IPAddress, int> ipInstances = new Dictionary<IPAddress, int>();
string[] lines = File.ReadAllLines(path);
foreach (string line in lines)
{
IPAddress ip = ... //extract your IP from the line string
if (!ipInstances.ContainsKey(ip))
{
ipInstances.Add(ip, 0);
}
ipInstances[ip]++;
}
然后您可以使用简单的Linq对它们进行排名:
You can then rank them using a simple Linq:
var ranked = ipInstances.OrderByDescending(kvp => kvp.Value).Select(kvp => kvp.Key);
[建议 - 马特T赫夫伦]
如果一次将整个文件加载到内存中是一个问题,那么一行更改应该可以缓解这个问题:
更改:
[Suggestion -- Matt T Heffron]
If loading the whole file into memory at once is a concern, then a one line change should alleviate the issue:
Change:
string[] lines = File.ReadAllLines(path);
为:
to be:
IEnumerable<string> lines = File.ReadLines(path);
使用 File.ReadLines()
而不是 File.Read 所有行
让 foreach
循环一次处理一行......
Using File.ReadLines()
instead of File.ReadAllLines
lets the foreach
loop deal with the lines one at a time...
非常简单:
Quite simple:
string path = @"C:\Somewhere\SomeLogFile.log";
string line;
using (StreamReader sr = File.OpenText(path)) {
while ((line = sr.ReadLine()) != null)) {
// Do whatever you have to do with line variable
}
}
你有骨架。现在由你决定:
- IP地址提取(提示:这里的正则表达式可能是合适的)
- 每个IP地址的数量(提示:一个字典< IPAddress,int>
可能适合这里)
祝你好运。
You have the skeleton. Now it's up to you to handle:
- the IP address extraction (tip: a regular expression could be suitable here)
- the count of each IP address (tip: a Dictionary<IPAddress, int>
could be suitable here)
Good luck.
这篇关于如何在C#中读取和处理大日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!