组文件和管道awk命令 [英] Group files and pipe to awk command
问题描述
我有一个目录中的文件;他们使用的是YYYY_MM_DD命名为:
I have files in a directory; they are named using YYYY_MM_DD:
-rw-r--r-- 1 root root 497186 Apr 21 13:17 2012_03_25
-rw-r--r-- 1 root root 490558 Apr 21 13:17 2012_03_26
-rw-r--r-- 1 root root 488797 Apr 21 13:17 2012_03_27
-rw-r--r-- 1 root root 316290 Apr 21 13:17 2012_03_28
-rw-r--r-- 1 root root 490081 Apr 21 13:17 2012_03_29
-rw-r--r-- 1 root root 486621 Apr 21 13:17 2012_03_30
-rw-r--r-- 1 root root 490904 Apr 21 13:17 2012_03_31
-rw-r--r-- 1 root root 491788 Apr 21 13:17 2012_04_01
-rw-r--r-- 1 root root 488630 Apr 21 13:17 2012_04_02
在文件中的第一列是一个数字,我用下面的 AWK
命令采取的第一列的平均值。
The first column within the file is a number, and I am using the following awk
command to take an average of that first column.
awk -F, '{ x += $1 } END { print x/NR }' MyFile
使用相同的命令,我可以通过两个文件awk来获得这两个文件的总平均作为一个整体。
Using the same command i can pass two files to awk to get the total average of both files as a whole.
awk -F, '{ x += $1 } END { print x/NR }' File1 File2
我想要做的是...
What I want to do is this...
我想在我的目录中的所有文件,并每月将它们分组,那么当月的所有文件传递给awk命令。
I want to get all the files in my directory, and group them per month, then pass all the files for the month to the awk command.
所以,按照同样的数据,还有3月7日的文件,我希望所有的7个文件要传递给我的 AWK
命令是这样的:
So as per the same data, there are 7 files in March, I would want all 7 files to be passed to my awk
command like this:
awk -F, '{ x += $1 } END { print x/NR }' File1 File2 File3 File4 File5 File6 File7
然后同样为四月份的集中。
Then likewise for April's set.
推荐答案
您想以某种方式与单纯的awk做到这一点,也可以使用文件寻找?例如:
Are you wanting to somehow accomplish this with awk alone, or can you use file globbing? For example:
awk -F, '{ #Do stuff }' 2012_03_[0-3][0-9]
将得到所有的三月文件。
will get all the March files.
您也可以使用 2012_03 *
但这是在比大于1的文件名匹配模式不太具体。
You could also use 2012_03*
but that's less specific in its globbing pattern than the above one.
的 修改的
Edit
您可以使用shell脚本是这样的:
You can use a shell script like this:
DIR="/tmp/tmp"
for month in $(find "$DIR" -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u); do
awk -F, '#dostuff' "$DIR/${month}"_[0-3][0-9] > output/dir/SUM_"${month}"
done
一如往常,有几个注意事项。带有空格的文件将打破它。如果有不符合该目录中的YYYY_MM_DD格式的文件,你会得到错误,但它不应该影响性能。让我知道,如果这些限制是不能接受的,我会觉得它多一点。
As always, there are a few caveats. Files with spaces will break it. You'll get errors if there are files that don't conform to the YYYY_MM_DD format in the directory, but it shouldn't affect performance. Let me know if those constraints are not acceptable and I'll think on it a little more.
这篇关于组文件和管道awk命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!