如何使我的shell脚本检查每个文件夹中的一个字一个目录，然后排序输出？ [英] How to make my shell script check every folder in a directory for a word and then rank the outputs?

查看：198 发布时间：2016/8/3 12:45:56 bash shell unix grep

本文介绍了如何使我的shell脚本检查每个文件夹中的一个字一个目录，然后排序输出？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件夹 reviews_folder 包含大量的文件，如 hotel_217616.dat 。我写了一个脚本 countreviews.sh 来检查单词作者出现在每个文件的次数，然后打印的数量为每个各自的文件。这里是我的脚本：

I have a folder reviews_folder that contains lots of files, such as hotel_217616.dat. I have written a script countreviews.sh to check the number of times the word "Author" appears in each file and then print the number out for each respective file. Here is my script:

grep -r "<Author>" "#1"

我可以不写 reviews_folder 在shell code，就必须把它作为命令行参数，因此 1 。时间我的词出现在每个文件的数量必须然后可以从排名最高到最低，例如：

I cannot write reviews_folder in the shell code, it must take it as an argument in the command line, hence #1. The number of time my word appears in each file must then be ranked from highest to lowest, for example

-- run script --
49
23
17

然而，当我运行我的脚本它说＃1：没有这样的文件或目录;为什么没有将 1 与 reviews_folder 当我输入：

However, when I run my script it says "#1: No such file or directory"; why isn't it replacing #1 with reviews_folder when I type:

./countreviews.sh reviews_folder

我的 countreviews.sh 是坐在同一目录作为我的 reviews_folder ，其中包含我将文件如果检查的事项。

My countreviews.sh is sitting in the same directory as my reviews_folder, which contains the files I will be checking if that matters.

推荐答案

首先，位置参数是 $ 1 ，而不是 1 。

First off, the positional parameter is $1 and not #1.

其次，你的脚本并没有真正算时间的数作者出现的词;它看起来字面上＆LT;作者方式＆gt; ，包括尖括号

Secondly, your script doesn't really "count the number of time the word Author appears"; it looks literally for <Author>, including the angle brackets.

我假设你想要的单词边界，如 \\＆LT;作者\\方式＆gt;

I assume you wanted word boundaries, as in \<Author\>.

的grep -r 只列出文件名ppended所有匹配的行，$ P $。你想只有计数和排序。要做到这一点，你可以做

grep -r just lists all matching lines, prepended by filenames. You want only the count, and sorted. To do this, you can do

grep -rwch 'Author'

-w 搜索词匹配

-c 返回每个文件匹配计数

-h 燮presses写入文件名称

-w searches for word matches
-c returns a match count per file
-h suppresses writing the file name

和对输出进行排序，你就管排序：

And to sort the output, you pipe it to sort:

grep -rwch 'Author' | sort -nr

-n 是数字排序，而 -r 为反转，因此数量最多首先是

-n is for "numerical sort", and -r for "reverse", so the largest number is first.

请注意这是如何的还是的只统计有多少的行的匹配作者;如果没有与五场比赛的线路，它是由的grep -c 仅计为一次。

Notice how this still only counts how many lines matched "Author"; if there is a line with five matches, it is counted only as one by grep -c.

要每一个发生，你可以正确计算这样：

To properly count every single occurrence, you could to this:

find . -type f -exec bash -c 'grep -wo "Author" {} | wc -l' \; | sort -nr

找到。型的F 递归找到的所有文件。

-exec 执行找到的每个文件的命令。因为我们在命令中使用管道，我们必须用产卵的bash -c 子shell。

的grep -WO作者{} |厕所-l 找到作者并打印在单独的行的每一场比赛; WC -l </ code>然后计算行。


在此之后发生的所有文件，排序-nr 再次排序的结果。





find . -type f finds recursively all files.
-exec executes a command for each file found. Because we use a pipe in that command, we have to spawn a subshell with bash -c.
grep -wo "Author" {} | wc -l finds every match of Author and prints it on a separate line; wc -l then counts the lines.
After this happened for all files, sort -nr again sorts the results.


                        这篇关于如何使我的shell脚本检查每个文件夹中的一个字一个目录，然后排序输出？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何使我的shell脚本检查每个文件夹中的一个字一个目录，然后排序输出？ [英] How to make my shell script check every folder in a directory for a word and then rank the outputs?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

如何使我的shell脚本检查每个文件夹中的一个字一个目录，然后排序输出？ [英] How to make my shell script check every folder in a directory for a word and then rank the outputs?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭