递归搜索二进制文件目录中的十六进制序列? [英] Recursively search directory of binary files for hexadecimal sequence?

查看:309
本文介绍了递归搜索二进制文件目录中的十六进制序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用于搜索一些十六进制值(例如0A 8b 02)的当前命令涉及:

The current commands I'm using to search some hex values (say 0A 8b 02) involve:

find . -type f -not -name "*.png" -exec xxd -p {} \; | grep "0a8b02" || xargs -0 -P 4

鉴于以下目标,是否有可能改善这一点:

Is it possible to improve this given the following goals:

  • 递归搜索文件
  • 显示偏移量和文件名
  • 排除具有某些扩展名的某些文件(上述示例将不搜索.png文件)
  • 速度:搜索需要处理直接总计约2GB的200,000个文件(大约50KB至1MB).
  • search files recursively
  • display the offset and filename
  • exclude certain files with certain extensions (above example will not search .png files)
  • speed: search needs to handle 200,000 files (around 50KB to 1MB) in a directly totaling ~2GB.

我不太确定xargs是否可以在4个处理器上正常工作.另外,由于grep是从xxd用管道传输的,因此在grep找到匹配项时打印文件名也很困难.有什么建议吗?

I'm not too confident if the xargs is working properly for 4 processors. Also I'm having difficulties printing the filename when grep finds a match since it is piped from xxd. Any suggestions?

推荐答案

IF:

  • 您有 GNU grep
  • 并且您搜索的十六进制字节绝不包含换行符(0xa) [1]
    • 如果它们包含NUL(0x),则必须通过文件(-f)而不是直接参数提供grep搜索字符串.
    • you have GNU grep
    • AND the hex bytes you search for NEVER contain newlines (0xa)[1]
      • If they contain NUL (0x), you must provide the grep search string via a file (-f) rather than by direct argument.

      以下示例将通过搜索0e 8b 02的示例将您带到那里:

      the following command would get you there, using the example of searching for 0e 8b 02:

      LC_ALL=C find . -type f -not -name "*.png" -exec grep -FHoab $'\x{0e}\x{8b}\x{02}' {} + |
        LC_ALL=C cut -d: -f1-2
      

      grep命令产生的输出行如下:

      The grep command produces output lines as follows:

      <filename>:<byte-offset>:<matched-bytes>
      

      其中LC_ALL=C cut -d: -f1-2然后降为<filename>:<byte-offset>

      命令几乎 BSD grep一起使用,除了报告的字节偏移始终是该模式所在行的 start 已匹配.
      换句话说:仅当文件中没有换行符之前的换行符时,字节偏移量才是正确的.
      另外,BSD grep不支持将NUL(0x0)字节指定为搜索字符串的一部分,即使通过带有-f的文件提供,也不支持.

      The command almost works with BSD grep, except that the byte offset reported is invariably the start of the line that the pattern was matched on.
      In other words: the byte offset will only be correct if no newlines precede a match in the file.
      Also, BSD grep doesn't support specifying NUL (0x0) bytes as part of the search string, not even when provided via a file with -f.

      • 请注意,根据使用find-exec ... +,将不会进行 并行处理,而只会进行少量 grep次调用,像xargs一样,一次将尽可能多的文件名传递到命令行上的grep.
      • 通过让grep直接搜索字节序列,不需要xxd:
        • 该序列被指定为 ANSI C引用的字符串,这意味着转义序列由 shell 扩展为文字,从而使Grep能够随后将生成的字符串作为文字进行搜索(通过-F) ,速度更快.
          链接的文章来自bash手册,但它们也可以在zsh(和ksh)中使用.
          • GNU Grep的替代方案是将-P(支持PRCE,Perl兼容的正则表达式)与未预扩展的转义序列一起使用,但这会更慢:grep -PHoab '\x{0e}\x{8b}\x{02}'
          • Note that there'll be no parallel processing, but only a few grep invocations, based on using find's -exec ... +, which, like xargs, passes as many filenames as will fit on a command line to grep at once.
          • By letting grep search for the byte sequence directly, there is no need for xxd:
            • The sequence is specified as an ANSI C-quoted string, which means that the escape sequences are expanded to literals by the shell, enabling Grep to then search for the resulting string as a literal (via -F), which is faster.
              The linked article is from the bash manual, but they work in zsh (and ksh) too.
              • A GNU Grep alternative is to use -P (support for PRCEs, Perl-compatible regular expressions) with non-pre-expanded escape sequences, but this will be slower: grep -PHoab '\x{0e}\x{8b}\x{02}'

              如果足以在给定的输入文件中找到最多 1 个匹配项,请添加-m 1.

              If it's sufficient to find at most 1 match in a given input file, add -m 1.

              [1]无法使用换行符,因为Grep始终将搜索模式字符串中的换行符视为分隔多个搜索模式.另外,Grep是基于 line 的,因此您无法跨行匹配; GNU Grep的-null-data选项可以将输入按NUL字节进行拆分,但只有在您的搜索字节序列也不包含NUL字节的情况下,该选项才有用.您还必须将 regex 中的字节值表示为转义序列并与-P结合使用-因为您需要使用转义序列\n代替实际换行符.

              [1] Newlines cannot be used, because Grep invariably treats newlines in a search-pattern string as separating multiple search patterns. Also, Grep is line-based, so you can't match across lines; GNU Grep's -null-data option to split the input by NUL bytes could help, but only if your search byte sequence doesn't also comprise NUL bytes; you'd also have to represent your byte values as escape sequences in a regex combined with -P - because you'll need to use escape sequence \n in lieu of actual newlines.

              [2] -o来使-b报告 match 的字节偏移,而不是该行开头的字节偏移. (如上所述,不幸的是,BSD Grep 总是执行后者);另外,只在此处报告匹配项是有益的,因为尝试打印整个行会导致输出行异常长,因为二进制文件中没有行的概念.无论哪种方式,从二进制文件输出字节都可能在终端中引起奇怪的渲染行为.

              [2] -o is needed to make -b report the byte offset of the match as opposed to that of the beginning of the line (as stated, BSD Grep always does the latter, unfortunately); additionally, it is beneficial to only report the matches themselves here, as an attempt to print the entire line would result in unpredictably long output lines, given that there's no concept of lines in binary files; either way, however, outputting bytes from a binary file may cause strange rendering behavior in the terminal.

              这篇关于递归搜索二进制文件目录中的十六进制序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆