Grep 认为文本文件是二进制的,但它不是 [英] Grep thinks text file is binary, but it isn't
问题描述
我在我们的代码库中遇到了一个 .cpp
文件,它被 grep 视为二进制文件.所以我不能像文本文件那样 grep 它,这很烦人,显然不是事情应该如何.所以我想知道为什么 grep 认为文件是二进制文件并解决这个问题.
I came across a .cpp
file in our codebase that is seen as binary by grep. So I can't grep it like a text file, which is annoying and obviously not how things ought to be. So I want to know why grep thinks the file is binary and address the issue.
我尝试使用命令查找任何不寻常的字符
I tried to find any characters out of the ordinary using the command
grep -Pna --color -r "[x00-x08]|[x10-x19]|[x80-xFF]" test.cpp
但它不会产生任何匹配.
but it doesn't yield any matches.
如何找出这个问题的原因?
How can figure out the cause of this problem?
我应该提到我在使用 windows git bash.
I should mention I'm on windows git bash.
语言环境输出:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
推荐答案
由于您使用的是 MS Windows,test.cpp
文件可能使用 UTF-16(常见于Windows 的最新版本)或 Windows-1252 (CP-1252) 作为其字符编码(可能是其中一条评论中的印刷引号).
Since you’re using MS Windows, it’s possible that the test.cpp
file is encoded using either UTF-16 (common in recent versions of Windows) or Windows-1252 (CP-1252) as its character encoding (perhaps a typographic quote in one of the comments).
当您的语言环境设置为 UTF-8 并且 grep
检测到该语言环境的无效字符时,它假定文件是二进制文件.解决此问题的一个快速方法是,通过在运行 时临时修改
命令:LC_ALL
环境变量,让 grep
使用 C
语言环境grep
When your locale is set to UTF-8 and grep
detects invalid characters for that locale, it assumes that the file is binary. A quick way around this issue, is to get grep
to use the C
locale by temporarily modifying the LC_ALL
environment variable when running the grep
command:
LC_ALL=C grep pattern test.cpp
更好的长期解决方案是将文本文件(使用 iconv
或您喜欢的文本编辑器)转换为使用 UTF-8 作为其字符编码.
A better long term solution would be to convert text files (using iconv
or your favourite text editor) to use UTF-8 as their character encoding.
这篇关于Grep 认为文本文件是二进制的,但它不是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!