使用变量的bash脚本grep无法找到实际存在的结果 [英] bash script grep using variable fails to find result that actually does exist

查看:61
本文介绍了使用变量的bash脚本grep无法找到实际存在的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 bash 脚本,它遍历链接列表,每个链接卷曲一个 html 页面,greps 用于特定的字符串格式(语法是:CVE-####-####),删除周围的html 标签(这是一种一致的格式,不需要特殊情况处理),在更改日志文件中搜索结果字符串 ID,最后根据是否找到字符串 ID 执行操作.

I have a bash script that iterates over a list of links, curl's down an html page per link, greps for a particular string format (syntax is: CVE-####-####), removes the surrounding html tags (this is a consistent format, no special case handling necessary), searches a changelog file for the resulting string ID, and finally does stuff based on whether the string ID was found or not.

找到的字符串 ID 被设置为变量.问题是,当对变量进行 grep 时,没有结果,即使我肯定知道某些 ID 应该有结果.这是脚本的相关部分:

The found string ID is set as a variable. The issue is that when grepping for the variable there are no results, even though I positively know there should be for some of the ID's. Here is the relevant portion of the script:

for link in $(cat links.txt); do
    curl -s "$link" | grep 'CVE-' | sed 's/<[^>]*>//g' | while read cve; do
        echo "$cve"
        grep "$cve" ./changelog.txt
    done
done

如果我在 grep 命令中对已知 ID 进行硬编码,脚本会找到该 ID 并按预期返回内容.我已经在这个变量上尝试了许多 grepping 的变体(例如,导出它并进行命令扩展,将更改日志和管道连接到 grep,通过 curl 链的命令扩展直接设置变量,围绕变量的单引号和双引号,半个其他十几种东西).

If I hardcode a known ID in the grep command, the script finds the ID and returns things as expected. I've tried many variations of grepping on this variable (e.g. exporting it and doing command expansion, cat'ing the changelog and piping to grep, setting variable directly via command expansion of the curl chain, single and double quotes surrounding variables, half a dozen other things).

我是否遗漏了 curl 输出变量的细微差别?grep |sed 链?当它被回显到标准输出或 >> 到文件时,事情看起来很好(没有奇数字符或回车等的单个 ID).

Am I missing something nuanced with the outputted variable from the curl | grep | sed chain? When it is echo'd to stdout or >> to a file, things look fine (a single ID with no odd characters or carriage returns etc.).

任何提示或替代解决方案将不胜感激.谢谢!

Any hints or alternate solutions would be much appreciated. Thanks!

仅供参考:

OSX:$bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)

我正在卷曲的 html 文件中塞满了回车.使用 set -x 运行脚本很有帮助,因为它揭示了真正的字符串:$'CVE-2011-2716 '.

The html file that I was curl'ing was chock full of carriage returns. Running the script with set -x was helpful because it revealed the true string being grepped: $'CVE-2011-2716 '.

+ read -r link
+ curl -s http://localhost:8080/link1.html
+ sed -n '/CVE-/s/<[^>]*>//gp'
+ read -r cve
+ grep -q -F $'CVE-2011-2716
' ./kernelChangelog.txt

同样从另一个角度调查,在vim中打开curled文件显示^M并执行printf %s "$cve" |xxd 还显示了附加到 grep'd 变量的回车十六进制代码 0d.依靠回声"标准输出是一种错误的诊断方式.使用有效的 CVE-####-#### 编写一个简单的 html 页面,然后添加回车(在 vim 插入模式下只需键入 ctrl-v ctrl-m 以插入回车)将创建一个示例文件上面的原始脚本片段失败了.

Also investigating from another angle, opening the curled file in vim showed ^M and doing a printf %s "$cve" | xxd also showed the carriage return hex code 0d appended to the grep'd variable. Relying on 'echo' stdout was a wrong way of diagnosing things. Writing a simple html page with a valid CVE-####-####, but then adding a carriage return (in vim insert mode just type ctrl-v ctrl-m to insert the carriage return) will create a sample file that fails with the original script snippet above.

这是我应该想出的非常标准的字符串清理内容.解决方案是删除回车,管道到 tr -d ' ' 是这样做的一种方法.我不确定这一系列步骤在 SO 上是否有特定的重复项,但无论如何这里是我现在的工作脚本:

This is pretty standard string sanitization stuff that I should have figured out. The solution is to remove carriage returns, piping to tr -d ' ' is one method of doing that. I'm not sure there is a specific duplicate on SO for this series of steps, but in any case here is my now working script:

while read -r link; do
  curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '
' | while read -r cve; do
    if grep -q -F "$cve" ./changelog.txt; then
      echo "FOUND: $cve";
    else
      echo "NOT FOUND: $cve";
    fi;
  done
done < links.txt

推荐答案

HTML 文件可以在行尾包含回车符,需要过滤掉.

HTML files can contain carriage returns at the ends of lines, you need to filter those out.

curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '
' | while read cve; do

注意不需要使用grep,你可以在sed命令中使用正则表达式过滤器.(你也可以使用 sed 中的 tr 命令来删除字符,但是对 这样做很麻烦,所以我用管道传输到 tr 代替).

Notice that there's no need to use grep, you can use a regular expression filter in the sed command. (You can also use the tr command in sed to remove characters, but doing this for is cumbersome, so I piped to tr instead).

这篇关于使用变量的bash脚本grep无法找到实际存在的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆