模拟&QUOT最好的办法;通过&QUOT组;从bash的? [英] Best way to simulate "group by" from bash?

查看:106
本文介绍了模拟&QUOT最好的办法;通过&QUOT组;从bash的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你有一个包含IP地址,每行一个地址的文件:

  10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1

您需要才是最重要的多少次出现在文件中的每个IP地址的shell脚本。对于previous输入您需要以下的输出:

  10.0.10.1 3
10.0.10.2 1
10.0.10.3 1

做到这一点的方法之一是:

 猫ip_addresses | uniq的|阅读时的IP

    呼应-n $ IP,
    grep的-c $ IP ip_addresses
DONE

然而,它是被有效真的不远。

您将如何更有效地解决这个问题,使用bash?

(有一点补充:我知道它可以从Perl或awk的解决,我很感兴趣,在bash更好的解决方案,而不是在那些语言。)

附加信息:

假设源文件是5GB和运行算法的机器具有4GB。所以排序不是一个有效的解决方案,也不是读取文件一次以上。

我喜欢的哈希表类似的解决方案 - 任何人都可以到该解决方案提供改进

附加信息#2:

有人问我为什么要麻烦在bash做它时,它是在例如方式更容易perl的。其原因是,在机器上,我不得不这样做的Perl是不适用于我。这是一个定制的Linux机器没有大多数我习惯的工具。而且我认为这是一个有趣的问题。

所以,请不要怪问题,只是忽略它,如果你不喜欢它。 : - )


解决方案

 排序ip_addresses | uniq的-c

这将首先打印计数,但除此之外,它应该是你想要什么。

Suppose you have a file that contains IP addresses, one address in each line:

10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1

You need a shell script that counts for each IP address how many times it appears in the file. For the previous input you need the following output:

10.0.10.1 3
10.0.10.2 1
10.0.10.3 1

One way to do this is:

cat ip_addresses |uniq |while read ip
do
    echo -n $ip" "
    grep -c $ip ip_addresses
done

However it is really far from being efficient.

How would you solve this problem more efficiently using bash?

(One thing to add: I know it can be solved from perl or awk, I'm interested in a better solution in bash, not in those languages.)

ADDITIONAL INFO:

Suppose that the source file is 5GB and the machine running the algorithm has 4GB. So sort is not an efficient solution, neither is reading the file more than once.

I liked the hashtable-like solution - anybody can provide improvements to that solution?

ADDITIONAL INFO #2:

Some people asked why would I bother doing it in bash when it is way easier in e.g. perl. The reason is that on the machine I had to do this perl wasn't available for me. It was a custom built linux machine without most of the tools I'm used to. And I think it was an interesting problem.

So please, don't blame the question, just ignore it if you don't like it. :-)

解决方案

sort ip_addresses | uniq -c

This will print the count first, but other than that it should be exactly what you want.

这篇关于模拟&QUOT最好的办法;通过&QUOT组;从bash的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆