bash脚本:计数文件唯一的行 [英] Bash Script: count unique lines in file

查看:241
本文介绍了bash脚本:计数文件唯一的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的文件(百万行)从几个小时的网络捕获包含IP地址和端口,每行一个IP /端口。行格式如下:

I have a large file (millions of lines) containing IP addresses and ports from a several hour network capture, one ip/port per line. Lines are of this format:

ip.ad.dre.ss[:port]

期望的结果:

有是在登录时,我收到的每个数据包的条目,所以有​​许多重复的地址。我希望能够通过某种类型的shell脚本,就可以将其降低到格式

Desired result:

There is an entry for each packet I received while logging, so there are a lot of duplicate addresses. I'd like to be able to run this through a shell script of some sort which will be able to reduce it to lines of the format

ip.ad.dre.ss[:port] count

其中,计数是特定的地址(和端口)的出现的次数。没有特别的工作要做,对待不同的端口为不同的地址。

where count is the number of occurrences of that specific address (and port). No special work has to be done, treat different ports as different addresses.

到目前为止,我使用这个命令刮所有从日志文件中的IP地址:

So far, I'm using this command to scrape all of the ip addresses from the log file:

grep -o -E [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+(:[0-9]+)? ip_traffic-1.log > ips.txt

从这里,我可以用一个相当简单的正则表达式来刮掉所有由我的地址发送的IP地址(我不在乎)

From that, I can use a fairly simple regex to scrape out all of the ip addresses that were sent by my address (which I don't care about)

然后我就可以使用以下方法来提取独特的条目:

I can then use the following to extract the unique entries:

sort -u ips.txt > intermediate.txt

我不知道我怎么能聚集符合那种莫名其妙计数。

I don't know how I can aggregate the line counts somehow with sort.

推荐答案

您可以使用 uniq的命令来获取排序重复的行数:

You can use the uniq command to get counts of sorted repeated lines:

sort ips.txt | uniq -c

要获得最常见的结果顶部(感谢彼得·贾里奇):

To get the most frequent results at top (thanks to Peter Jaric):

sort ips.txt | uniq -c | sort -bgr

这篇关于bash脚本:计数文件唯一的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆