如何使我的shell脚本检查每个文件夹中的一个字一个目录,然后排序输出? [英] How to make my shell script check every folder in a directory for a word and then rank the outputs?

查看:198
本文介绍了如何使我的shell脚本检查每个文件夹中的一个字一个目录,然后排序输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件夹 reviews_folder 包含大量的文件,如 hotel_217616.dat 。我写了一个脚本 countreviews.sh 来检查单词作者出现在每个文件的次数,然后打印的数量为每个各自的文件。这里是我的脚本:

I have a folder reviews_folder that contains lots of files, such as hotel_217616.dat. I have written a script countreviews.sh to check the number of times the word "Author" appears in each file and then print the number out for each respective file. Here is my script:

grep -r "<Author>" "#1"

我可以不写 reviews_folder 在shell code,就必须把它作为命令行参数,因此 1 。时间我的词出现在每个文件的数量必须然后可以从排名最高到最低,例如:

I cannot write reviews_folder in the shell code, it must take it as an argument in the command line, hence #1. The number of time my word appears in each file must then be ranked from highest to lowest, for example

-- run script --
49
23
17

然而,当我运行我的脚本它说#1:没有这样的文件或目录;为什么没有将 1 reviews_folder 当我输入:

However, when I run my script it says "#1: No such file or directory"; why isn't it replacing #1 with reviews_folder when I type:

./countreviews.sh reviews_folder

我的 countreviews.sh 是坐在同一目录作为我的 reviews_folder ,其中包含我将文件如果检查的事项。

My countreviews.sh is sitting in the same directory as my reviews_folder, which contains the files I will be checking if that matters.

推荐答案

首先,位置参数是 $ 1 ,而不是 1

First off, the positional parameter is $1 and not #1.

其次,你的脚本并没有真正算时间的数作者出现的词;它看起来字面上&LT;作者方式&gt; ,包括尖括号

Secondly, your script doesn't really "count the number of time the word Author appears"; it looks literally for <Author>, including the angle brackets.

我假设你想要的单词边界,如 \\&LT;作者\\方式&gt;

I assume you wanted word boundaries, as in \<Author\>.

的grep -r 只列出文件名ppended所有匹配的行,$ P $。你想只有计数和排序。要做到这一点,你可以做

grep -r just lists all matching lines, prepended by filenames. You want only the count, and sorted. To do this, you can do

grep -rwch 'Author'


  • -w 搜索词匹配

  • -c 返回每个文件匹配计数

  • -h 燮presses写入文件名称

    • -w searches for word matches
    • -c returns a match count per file
    • -h suppresses writing the file name
    • 和对输出进行排序,你就管排序

      And to sort the output, you pipe it to sort:

      grep -rwch 'Author' | sort -nr
      

      -n 是数字排序,而 -r 为反转,因此数量最多首先是

      -n is for "numerical sort", and -r for "reverse", so the largest number is first.

      请注意这是如何的还是的只统计有多少的的匹配作者;如果没有与五场比赛的线路,它是由的grep -c 仅计为一次。

      Notice how this still only counts how many lines matched "Author"; if there is a line with five matches, it is counted only as one by grep -c.

      要每一个发生,你可以正确计算这样:

      To properly count every single occurrence, you could to this:

      find . -type f -exec bash -c 'grep -wo "Author" {} | wc -l' \; | sort -nr
      


      • 找到。型的F 递归找到的所有文件。

      • -exec 执行找到的每个文件的命令。因为我们在命令中使用管道,我们必须用产卵的bash -c 子shell。

      • 的grep -WO作者{} |厕所-l 找到作者并打印在单独的行的每一场比赛; WC -l <​​/ code>然后计算行。

      • 在此之后发生的所有文件,排序-nr 再次排序的结果。

        • find . -type f finds recursively all files.
        • -exec executes a command for each file found. Because we use a pipe in that command, we have to spawn a subshell with bash -c.
        • grep -wo "Author" {} | wc -l finds every match of Author and prints it on a separate line; wc -l then counts the lines.
        • After this happened for all files, sort -nr again sorts the results.
        • 这篇关于如何使我的shell脚本检查每个文件夹中的一个字一个目录,然后排序输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆