bash脚本：计数文件唯一的行 [英] Bash Script: count unique lines in file

查看：241 发布时间：2016/8/2 14:28:12 bash

本文介绍了bash脚本：计数文件唯一的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的文件（百万行）从几个小时的网络捕获包含IP地址和端口，每行一个IP /端口。行格式如下：

I have a large file (millions of lines) containing IP addresses and ports from a several hour network capture, one ip/port per line. Lines are of this format:

ip.ad.dre.ss[:port]

期望的结果：

有是在登录时，我收到的每个数据包的条目，所以有许多重复的地址。我希望能够通过某种类型的shell脚本，就可以将其降低到格式

Desired result:

There is an entry for each packet I received while logging, so there are a lot of duplicate addresses. I'd like to be able to run this through a shell script of some sort which will be able to reduce it to lines of the format

ip.ad.dre.ss[:port] count

其中，计数是特定的地址（和端口）的出现的次数。没有特别的工作要做，对待不同的端口为不同的地址。

where count is the number of occurrences of that specific address (and port). No special work has to be done, treat different ports as different addresses.

到目前为止，我使用这个命令刮所有从日志文件中的IP地址：

So far, I'm using this command to scrape all of the ip addresses from the log file:

grep -o -E [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+(:[0-9]+)? ip_traffic-1.log > ips.txt

从这里，我可以用一个相当简单的正则表达式来刮掉所有由我的地址发送的IP地址（我不在乎）

From that, I can use a fairly simple regex to scrape out all of the ip addresses that were sent by my address (which I don't care about)

然后我就可以使用以下方法来提取独特的条目：

I can then use the following to extract the unique entries:

sort -u ips.txt > intermediate.txt

我不知道我怎么能聚集符合那种莫名其妙计数。

I don't know how I can aggregate the line counts somehow with sort.

bash脚本：计数文件唯一的行 [英] Bash Script: count unique lines in file

问题描述

期望的结果：

Desired result:

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

bash脚本：计数文件唯一的行 [英] Bash Script: count unique lines in file

问题描述

期望的结果：

Desired result:

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭