如何排序3G字节的访问日志文件? [英] How to sort a 3G bytes access log file?

查看:127
本文介绍了如何排序3G字节的访问日志文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好:现在我有一个3G字节的tomcat访问日志,名为url,每行都是一个url.我想计算每个网址,并按每个网址的数量对这些网址顺序进行排序.我是这样做的:

Hi all: Now I have a 3G bytes tomcat access log named urls, each line is a url. I want to count each url and sort these urls order by the number of each url. I did it this way:

awk '{print $0}' urls | sort | uniq -c | sort -nr >> output

但是完成这项工作花了很长时间,已经花费了30分钟,并且仍然可以正常工作. 日志文件就像下面这样:

But it took really long time to finish this job, it's already took 30 minutes and its still working. log file is like bellow:

/open_api/borrow_business/get_apply_by_user
/open_api/borrow_business/get_apply_by_user
/open_api/borrow_business/get_apply_by_user
/open_api/borrow_business/get_apply_by_user
/loan/recent_apply_info?passportId=Y20151206000011745
/loan/recent_apply_info?passportId=Y20160331000000423
/open_api/borrow_business/get_apply_by_user
...

我还有其他方法可以处理3G字节文件吗?预先感谢!

Is there any other way that I could process and sort a 3G bytes file? Thanks in advance!

推荐答案

我不确定您为什么现在使用awk-它没有做任何有用的事情.

I'm not sure why you're using awk at the moment - it's not doing anything useful.

我建议使用类似这样的东西:

I would suggest using something like this:

awk '{ ++urls[$0] } END { for (i in urls) print urls[i], i }' urls | sort -nr

这将对每个URL进行计数,然后对输出进行排序.

This builds up a count of each URL and then sorts the output.

这篇关于如何排序3G字节的访问日志文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆