递归搜索二进制文件目录中的十六进制序列? [英] Recursively search directory of binary files for hexadecimal sequence?
问题描述
我用于搜索一些十六进制值(例如0A 8b 02
)的当前命令涉及:
The current commands I'm using to search some hex values (say 0A 8b 02
) involve:
find . -type f -not -name "*.png" -exec xxd -p {} \; | grep "0a8b02" || xargs -0 -P 4
鉴于以下目标,是否有可能改善这一点:
Is it possible to improve this given the following goals:
- 递归搜索文件
- 显示偏移量和文件名
- 排除具有某些扩展名的某些文件(上述示例将不搜索
.png
文件) - 速度:搜索需要处理直接总计约2GB的200,000个文件(大约50KB至1MB).
- search files recursively
- display the offset and filename
- exclude certain files with certain extensions (above example will not search
.png
files) - speed: search needs to handle 200,000 files (around 50KB to 1MB) in a directly totaling ~2GB.
我不太确定xargs
是否可以在4个处理器上正常工作.另外,由于grep
是从xxd
用管道传输的,因此在grep
找到匹配项时打印文件名也很困难.有什么建议吗?
I'm not too confident if the xargs
is working properly for 4 processors. Also I'm having difficulties printing the filename when grep
finds a match since it is piped from xxd
. Any suggestions?
推荐答案
IF:
- 您有 GNU
grep
- 并且您搜索的十六进制字节绝不包含换行符(
0xa
) [1]- 如果它们包含NUL(
0x
),则必须通过文件(-f
)而不是直接参数提供grep
搜索字符串.
- you have GNU
grep
- AND the hex bytes you search for NEVER contain newlines (
0xa
)[1]- If they contain NUL (
0x
), you must provide thegrep
search string via a file (-f
) rather than by direct argument.
以下示例将通过搜索
0e 8b 02
的示例将您带到那里:the following command would get you there, using the example of searching for
0e 8b 02
:LC_ALL=C find . -type f -not -name "*.png" -exec grep -FHoab $'\x{0e}\x{8b}\x{02}' {} + | LC_ALL=C cut -d: -f1-2
grep
命令产生的输出行如下:The
grep
command produces output lines as follows:<filename>:<byte-offset>:<matched-bytes>
其中
LC_ALL=C cut -d: -f1-2
然后降为<filename>:<byte-offset>
命令几乎与 BSD
grep
一起使用,除了报告的字节偏移始终是该模式所在行的 start 已匹配.
换句话说:仅当文件中没有换行符之前的换行符时,字节偏移量才是正确的.
另外,BSDgrep
不支持将NUL(0x0
)字节指定为搜索字符串的一部分,即使通过带有-f
的文件提供,也不支持.The command almost works with BSD
grep
, except that the byte offset reported is invariably the start of the line that the pattern was matched on.
In other words: the byte offset will only be correct if no newlines precede a match in the file.
Also, BSDgrep
doesn't support specifying NUL (0x0
) bytes as part of the search string, not even when provided via a file with-f
.- 请注意,根据使用
find
的-exec ... +
,将不会进行 并行处理,而只会进行少量grep
次调用,像xargs
一样,一次将尽可能多的文件名传递到命令行上的grep
. - 通过让
grep
直接搜索字节序列,不需要xxd
:- 该序列被指定为 ANSI C引用的字符串,这意味着转义序列由 shell 扩展为文字,从而使Grep能够随后将生成的字符串作为文字进行搜索(通过
-F
) ,速度更快.
链接的文章来自bash
手册,但它们也可以在zsh
(和ksh
)中使用.- GNU Grep的替代方案是将
-P
(支持PRCE,Perl兼容的正则表达式)与未预扩展的转义序列一起使用,但这会更慢:grep -PHoab '\x{0e}\x{8b}\x{02}'
- Note that there'll be no parallel processing, but only a few
grep
invocations, based on usingfind
's-exec ... +
, which, likexargs
, passes as many filenames as will fit on a command line togrep
at once. - By letting
grep
search for the byte sequence directly, there is no need forxxd
:- The sequence is specified as an ANSI C-quoted string, which means that the escape sequences are expanded to literals by the shell, enabling Grep to then search for the resulting string as a literal (via
-F
), which is faster.
The linked article is from thebash
manual, but they work inzsh
(andksh
) too.- A GNU Grep alternative is to use
-P
(support for PRCEs, Perl-compatible regular expressions) with non-pre-expanded escape sequences, but this will be slower:grep -PHoab '\x{0e}\x{8b}\x{02}'
如果足以在给定的输入文件中找到最多 1 个匹配项,请添加
-m 1
.If it's sufficient to find at most 1 match in a given input file, add
-m 1
.[1]无法使用换行符,因为Grep始终将搜索模式字符串中的换行符视为分隔多个搜索模式.另外,Grep是基于 line 的,因此您无法跨行匹配; GNU Grep的
-null-data
选项可以将输入按NUL字节进行拆分,但只有在您的搜索字节序列也不包含NUL字节的情况下,该选项才有用.您还必须将 regex 中的字节值表示为转义序列并与-P
结合使用-因为您需要使用转义序列\n
代替实际换行符.[1] Newlines cannot be used, because Grep invariably treats newlines in a search-pattern string as separating multiple search patterns. Also, Grep is line-based, so you can't match across lines; GNU Grep's
-null-data
option to split the input by NUL bytes could help, but only if your search byte sequence doesn't also comprise NUL bytes; you'd also have to represent your byte values as escape sequences in a regex combined with-P
- because you'll need to use escape sequence\n
in lieu of actual newlines.[2]
-o
来使-b
报告 match 的字节偏移,而不是该行开头的字节偏移. (如上所述,不幸的是,BSD Grep 总是执行后者);另外,只在此处报告匹配项是有益的,因为尝试打印整个行会导致输出行异常长,因为二进制文件中没有行的概念.无论哪种方式,从二进制文件输出字节都可能在终端中引起奇怪的渲染行为.[2]
-o
is needed to make-b
report the byte offset of the match as opposed to that of the beginning of the line (as stated, BSD Grep always does the latter, unfortunately); additionally, it is beneficial to only report the matches themselves here, as an attempt to print the entire line would result in unpredictably long output lines, given that there's no concept of lines in binary files; either way, however, outputting bytes from a binary file may cause strange rendering behavior in the terminal.这篇关于递归搜索二进制文件目录中的十六进制序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- A GNU Grep alternative is to use
- The sequence is specified as an ANSI C-quoted string, which means that the escape sequences are expanded to literals by the shell, enabling Grep to then search for the resulting string as a literal (via
- GNU Grep的替代方案是将
- 该序列被指定为 ANSI C引用的字符串,这意味着转义序列由 shell 扩展为文字,从而使Grep能够随后将生成的字符串作为文字进行搜索(通过
- If they contain NUL (
- 如果它们包含NUL(