bash脚本grep使用变量无法找到实际存在的结果 [英] bash script grep using variable fails to find result that actually does exist

查看:194
本文介绍了bash脚本grep使用变量无法找到实际存在的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个bash脚本,它遍历链接列表,卷曲每个链接的html页面,摸索特定的字符串格式(语法为:CVE-####-####),删除周围的内容html标记(这是一种一致的格式,无需特殊处理),在更改日志文件中搜索生成的字符串ID,最后根据是否找到了字符串ID进行处理。

I have a bash script that iterates over a list of links, curl's down an html page per link, greps for a particular string format (syntax is: CVE-####-####), removes the surrounding html tags (this is a consistent format, no special case handling necessary), searches a changelog file for the resulting string ID, and finally does stuff based on whether the string ID was found or not.

找到的字符串ID设置为变量。问题在于,即使我确实知道某些ID应该存在,当为变量grepping时也没有结果。这是脚本的相关部分:

The found string ID is set as a variable. The issue is that when grepping for the variable there are no results, even though I positively know there should be for some of the ID's. Here is the relevant portion of the script:

for link in $(cat links.txt); do
    curl -s "$link" | grep 'CVE-' | sed 's/<[^>]*>//g' | while read cve; do
        echo "$cve"
        grep "$cve" ./changelog.txt
    done
done

如果我在grep命令中对已知的ID进行硬编码,则脚本会找到该ID并按预期返回结果。我已经尝试过对该变量进行grepping的多种变体(例如,将其导出并进行命令扩展,将更改日志和管道添加到grep,通过curl链的命令扩展直接设置变量,对变量使用单引号和双引号,半个

If I hardcode a known ID in the grep command, the script finds the ID and returns things as expected. I've tried many variations of grepping on this variable (e.g. exporting it and doing command expansion, cat'ing the changelog and piping to grep, setting variable directly via command expansion of the curl chain, single and double quotes surrounding variables, half a dozen other things).

我错过了 curl的输出变量带来的细微差别吗? grep | sed 链?当将它回显到stdout或>>到文件时,看起来一切正常(单个ID,没有奇数字符或回车符等)。

Am I missing something nuanced with the outputted variable from the curl | grep | sed chain? When it is echo'd to stdout or >> to a file, things look fine (a single ID with no odd characters or carriage returns etc.).

任何提示或其他解决方案将不胜感激。谢谢!

Any hints or alternate solutions would be much appreciated. Thanks!

仅供参考:

OSX:$bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)

编辑:

我正在卷曲的html文件塞满了回车符。使用set -x运行脚本很有帮助,因为它显示了被grepped的真实字符串:$'CVE-2011-2716\r'。

The html file that I was curl'ing was chock full of carriage returns. Running the script with set -x was helpful because it revealed the true string being grepped: $'CVE-2011-2716\r'.

+ read -r link
+ curl -s http://localhost:8080/link1.html
+ sed -n '/CVE-/s/<[^>]*>//gp'
+ read -r cve
+ grep -q -F $'CVE-2011-2716\r' ./kernelChangelog.txt

也从另一个角度进行调查,在vim中打开卷曲的文件显示^ M并执行printf%s $ cve | xxd还显示了grep变量后的回车十六进制代码0d。依靠 echo标准输出是诊断事物的错误方法。使用有效的CVE-####-#####编写一个简单的html页面,然后添加回车符(在vim插入模式下,只需键入ctrl-v ctrl-m即可插入回车符)将创建一个示例文件

Also investigating from another angle, opening the curled file in vim showed ^M and doing a printf %s "$cve" | xxd also showed the carriage return hex code 0d appended to the grep'd variable. Relying on 'echo' stdout was a wrong way of diagnosing things. Writing a simple html page with a valid CVE-####-####, but then adding a carriage return (in vim insert mode just type ctrl-v ctrl-m to insert the carriage return) will create a sample file that fails with the original script snippet above.

这是我应该弄清楚的非常标准的字符串清理内容。解决方案是删除回车,将管道输送到tr -d'\r'是这样做的一种方法。我不确定这一系列步骤在SO上是否存在特定的重复项,但是无论如何,这是我现在可以使用的脚本:

This is pretty standard string sanitization stuff that I should have figured out. The solution is to remove carriage returns, piping to tr -d '\r' is one method of doing that. I'm not sure there is a specific duplicate on SO for this series of steps, but in any case here is my now working script:

while read -r link; do
  curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read -r cve; do
    if grep -q -F "$cve" ./changelog.txt; then
      echo "FOUND: $cve";
    else
      echo "NOT FOUND: $cve";
    fi;
  done
done < links.txt


推荐答案

HTML文件可以在

curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read cve; do

注意,不需要使用 grep ,您可以在 sed 命令中使用正则表达式过滤器。 (您也可以在 sed 中使用 tr 命令删除字符,但是对于 \r 很麻烦,所以我通过管道传输到 tr )。

Notice that there's no need to use grep, you can use a regular expression filter in the sed command. (You can also use the tr command in sed to remove characters, but doing this for \r is cumbersome, so I piped to tr instead).

这篇关于bash脚本grep使用变量无法找到实际存在的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆