如何使我的shell脚本检查每个文件夹中的一个字一个目录,然后排序输出? [英] How to make my shell script check every folder in a directory for a word and then rank the outputs?
问题描述
我有一个文件夹 reviews_folder
包含大量的文件,如 hotel_217616.dat
。我写了一个脚本 countreviews.sh
来检查单词作者出现在每个文件的次数,然后打印的数量为每个各自的文件。这里是我的脚本:
I have a folder reviews_folder
that contains lots of files, such as hotel_217616.dat
. I have written a script countreviews.sh
to check the number of times the word "Author" appears in each file and then print the number out for each respective file. Here is my script:
grep -r "<Author>" "#1"
我可以不写 reviews_folder
在shell code,就必须把它作为命令行参数,因此 1
。时间我的词出现在每个文件的数量必须然后可以从排名最高到最低,例如:
I cannot write reviews_folder
in the shell code, it must take it as an argument in the command line, hence #1
. The number of time my word appears in each file must then be ranked from highest to lowest, for example
-- run script --
49
23
17
然而,当我运行我的脚本它说#1:没有这样的文件或目录
;为什么没有将 1
与 reviews_folder
当我输入:
However, when I run my script it says "#1: No such file or directory"
; why isn't it replacing #1
with reviews_folder
when I type:
./countreviews.sh reviews_folder
我的 countreviews.sh
是坐在同一目录作为我的 reviews_folder
,其中包含我将文件如果检查的事项。
My countreviews.sh
is sitting in the same directory as my reviews_folder
, which contains the files I will be checking if that matters.
推荐答案
首先,位置参数是 $ 1
,而不是 1
。
First off, the positional parameter is $1
and not #1
.
其次,你的脚本并没有真正算时间的数作者
出现的词;它看起来字面上&LT;作者方式&gt;
,包括尖括号
Secondly, your script doesn't really "count the number of time the word Author
appears"; it looks literally for <Author>
, including the angle brackets.
我假设你想要的单词边界,如 \\&LT;作者\\方式&gt;
I assume you wanted word boundaries, as in \<Author\>
.
的grep -r
只列出文件名ppended所有匹配的行,$ P $。你想只有计数和排序。要做到这一点,你可以做
grep -r
just lists all matching lines, prepended by filenames. You want only the count, and sorted. To do this, you can do
grep -rwch 'Author'
-
-w
搜索词匹配 -
-c
返回每个文件匹配计数 -
-h
燮presses写入文件名称 -w
searches for word matches-c
returns a match count per file-h
suppresses writing the file name
和对输出进行排序,你就管排序
:
And to sort the output, you pipe it to sort
:
grep -rwch 'Author' | sort -nr
-n
是数字排序,而 -r
为反转,因此数量最多首先是
-n
is for "numerical sort", and -r
for "reverse", so the largest number is first.
请注意这是如何的还是的只统计有多少的行的匹配作者;如果没有与五场比赛的线路,它是由的grep -c
仅计为一次。
Notice how this still only counts how many lines matched "Author"; if there is a line with five matches, it is counted only as one by grep -c
.
要每一个发生,你可以正确计算这样:
To properly count every single occurrence, you could to this:
find . -type f -exec bash -c 'grep -wo "Author" {} | wc -l' \; | sort -nr
-
找到。型的F
递归找到的所有文件。 -
-exec
执行找到的每个文件的命令。因为我们在命令中使用管道,我们必须用产卵的bash -c
子shell。 -
的grep -WO作者{} |厕所-l
找到作者
并打印在单独的行的每一场比赛;WC -l </ code>然后计算行。
- 在此之后发生的所有文件,
排序-nr
再次排序的结果。 find . -type f
finds recursively all files.-exec
executes a command for each file found. Because we use a pipe in that command, we have to spawn a subshell withbash -c
.grep -wo "Author" {} | wc -l
finds every match ofAuthor
and prints it on a separate line;wc -l
then counts the lines.- After this happened for all files,
sort -nr
again sorts the results.
这篇关于如何使我的shell脚本检查每个文件夹中的一个字一个目录,然后排序输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!