组文件和管道awk命令 [英] Group files and pipe to awk command

查看:270
本文介绍了组文件和管道awk命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个目录中的文件;他们使用的是YYYY_MM_DD命名为:

I have files in a directory; they are named using YYYY_MM_DD:

-rw-r--r-- 1 root root 497186 Apr 21 13:17 2012_03_25
-rw-r--r-- 1 root root 490558 Apr 21 13:17 2012_03_26
-rw-r--r-- 1 root root 488797 Apr 21 13:17 2012_03_27
-rw-r--r-- 1 root root 316290 Apr 21 13:17 2012_03_28
-rw-r--r-- 1 root root 490081 Apr 21 13:17 2012_03_29
-rw-r--r-- 1 root root 486621 Apr 21 13:17 2012_03_30
-rw-r--r-- 1 root root 490904 Apr 21 13:17 2012_03_31
-rw-r--r-- 1 root root 491788 Apr 21 13:17 2012_04_01
-rw-r--r-- 1 root root 488630 Apr 21 13:17 2012_04_02

在文件中的第一列是一个数字,我用下面的 AWK 命令采取的第一列的平均值。

The first column within the file is a number, and I am using the following awk command to take an average of that first column.

awk -F, '{ x += $1 } END { print x/NR }' MyFile

使用相同的命令,我可以通过两个文件awk来获得这两个文件的总平均作为一个整体。

Using the same command i can pass two files to awk to get the total average of both files as a whole.

awk -F, '{ x += $1 } END { print x/NR }' File1 File2

我想要做的是...

What I want to do is this...

我想在我的目录中的所有文件,并每月将它们分组,那么当月的所有文件传递给awk命令。

I want to get all the files in my directory, and group them per month, then pass all the files for the month to the awk command.

所以,按照同样的数据,还有3月7日的文件,我希望所有的7个文件要传递给我的 AWK 命令是这样的:

So as per the same data, there are 7 files in March, I would want all 7 files to be passed to my awk command like this:

awk -F, '{ x += $1 } END { print x/NR }' File1 File2 File3 File4 File5 File6 File7

然后同样为四月份的集中。

Then likewise for April's set.

推荐答案

您想以某种方式与单纯的awk做到这一点,也可以使用文件寻找?例如:

Are you wanting to somehow accomplish this with awk alone, or can you use file globbing? For example:

awk -F, '{ #Do stuff }' 2012_03_[0-3][0-9]

将得到所有的三月文件。

will get all the March files.

您也可以使用 2012_03 * 但这是在比大于1的文件名匹配模式不太具体。

You could also use 2012_03* but that's less specific in its globbing pattern than the above one.

修改

Edit

您可以使用shell脚本是这样的:

You can use a shell script like this:

DIR="/tmp/tmp"
for month in $(find "$DIR" -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u); do
  awk -F, '#dostuff' "$DIR/${month}"_[0-3][0-9] > output/dir/SUM_"${month}"
done

一如往常,有几个注意事项。带有空格的文件将打破它。如果有不符合该目录中的YYYY_MM_DD格式的文件,你会得到错误,但它不应该影响性能。让我知道,如果这些限制是不能接受的,我会觉得它多一点。

As always, there are a few caveats. Files with spaces will break it. You'll get errors if there are files that don't conform to the YYYY_MM_DD format in the directory, but it shouldn't affect performance. Let me know if those constraints are not acceptable and I'll think on it a little more.

这篇关于组文件和管道awk命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆