排序和计数方法比cat文件更快排序| uniq -c [英] Sorting and counting method faster then cat file | sort | uniq -c

查看:118
本文介绍了排序和计数方法比cat文件更快排序| uniq -c的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下脚本来解析一些|分隔的字段/值对.样本数据看起来像 | Apple = 32.23 |香蕉= 1232.12 |葡萄= 12312 |梨= 231 |葡萄= 1231 |

I have the following script that parses some | delimited field/value pairs. Sample data looks like |Apple=32.23|Banana =1232.12|Grape=12312|Pear=231|Grape=1231|

我只是想计算A,B或C字段名称出现在日志文件中的次数.字段列表必须是动态的.日志文件很大",每个大约500兆,因此需要一些时间才能对每个文件进行排序.一旦执行剪切并获得每行一个字段的文件,有没有一种更快的计数方法?

I am just looking to count how many times A, B or C field names appear in the log file. The field list needs to be dynamic. Log files are 'big' about 500 megs each so it takes a while to sort each file. Is there a faster way to do the count once I do the cut and get a file with one field per line?

 cat /bb/logs/$dir/$file.txt | tr -s "|" "\n" | cut -d "=" -f 1 | sort | uniq -c > /data/logs/$dir/$file.txt.count

我知道这部分运行很快.我可以肯定地看到它陷入了困境.

I know for a fact that this part runs fast. I can see with certainty it gets bogged down in the sort.

cat /bb/logs/$dir/$file.txt | tr -s "|" "\n" | cut -d "=" -f 1 

运行完剪切后,下面是一个示例输出,当然文件要长得多

After I have run the cut a sample output is below, of course the file is much longer

Apple
Banana
Grape
Pear
Grape

经过排序和计数,我得到了

After the sort and count I get

 1 Apple
 1 Banana 
 1 Pear
 2 Grape

问题是我的实际数据排序时间太长.我认为>将剪切的输出输出到文件会更快,但不确定在大"文本文件中计算唯一条目的最快方法

The problem is the sort for my actual data takes way too long. I think it would be faster to > the output of the cut to a file but not sure the fastest way to count unique entries in a 'large' text file

推荐答案

AWK无需排序即可很好地完成它,尝试一下,它应该会表现更好;

AWK can do it pretty well without sorting, try this, it should perform better;

cat test.txt | tr -s "|" "\n" | cut -d "=" -f 1 |
   awk '{count[$1]++}END{for(c in count) print c,"found "count[c]" times."}' 

这篇关于排序和计数方法比cat文件更快排序| uniq -c的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆