Grep认为文本文件是二进制的,但事实并非如此 [英] Grep thinks text file is binary, but it isn't
问题描述
我在我们的代码库中发现了一个被grep看作二进制文件的 .cpp
文件。所以我不能像文本文件那样格式化它,这很烦人,显然不是事情应该如何。所以我想知道为什么grep认为这个文件是二进制文件并且解决了这个问题。
我试图用命令找到任何不寻常的字符
grep -Pna --color -r[\x00-\x08] | [\x10-\x19] | [\\ \\ x80-\xFF]test.cpp
但它不会产生任何匹配。 / p>
如何才能找出这个问题的原因?
我应该提到我在windows git bash上。
语言环境输出:
LANG = en_US.UTF-8
$ p $因为你使用的是MS Windows,所以
LC_CTYPE =en_US.UTF-8
LC_NUMERIC =en_US.UTF-8
LC_TIME =en_US.UTF-8
LC_COLLATE =en_US.UTF- 8
LC_MONETARY =en_US.UTF-8
LC_MESSAGES =en_US.UTF-8
LC_ALL =
test.cpp
解决方案 >文件使用UTF-16(在Windows的最新版本中通用)或Windows-1252(CP-125)进行编码2)作为其字符编码(可能是其中一个注释中的印刷引用)。
当您的语言环境设置为UTF-8并且
grep
检测到该语言环境的无效字符,它假定该文件是二进制文件。解决此问题的一个快速方法是,通过临时修改grep
来使用C
语言环境在运行grep
命令时,c> LC_ALL 环境变量:
LC_ALL = C grep模式test.cpp
更好的长期解决方案是转换文本文件(使用
iconv
或您最喜爱的文本编辑器)以使用UTF-8作为它们的字符编码。I came across a
.cpp
file in our codebase that is seen as binary by grep. So I can't grep it like a text file, which is annoying and obviously not how things ought to be. So I want to know why grep thinks the file is binary and address the issue.I tried to find any characters out of the ordinary using the command
grep -Pna --color -r "[\x00-\x08]|[\x10-\x19]|[\x80-\xFF]" test.cpp
but it doesn't yield any matches.
How can figure out the cause of this problem?
I should mention I'm on windows git bash.
Output of locale:
LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL=
解决方案Since you’re using MS Windows, it’s possible that the
test.cpp
file is encoded using either UTF-16 (common in recent versions of Windows) or Windows-1252 (CP-1252) as its character encoding (perhaps a typographic quote in one of the comments).When your locale is set to UTF-8 and
grep
detects invalid characters for that locale, it assumes that the file is binary. A quick way around this issue, is to getgrep
to use theC
locale by temporarily modifying theLC_ALL
environment variable when running thegrep
command:LC_ALL=C grep pattern test.cpp
A better long term solution would be to convert text files (using
iconv
or your favourite text editor) to use UTF-8 as their character encoding.这篇关于Grep认为文本文件是二进制的,但事实并非如此的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文