Grep 认为文本文件是二进制的，但它不是 [英] Grep thinks text file is binary, but it isn't

查看：41 发布时间：2022/1/6 14:05:37 windows text grep binaryfiles git-bash

本文介绍了Grep 认为文本文件是二进制的，但它不是的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在我们的代码库中遇到了一个 .cpp 文件，它被 grep 视为二进制文件.所以我不能像文本文件那样 grep 它，这很烦人，显然不是事情应该如何.所以我想知道为什么 grep 认为文件是二进制文件并解决这个问题.

I came across a .cpp file in our codebase that is seen as binary by grep. So I can't grep it like a text file, which is annoying and obviously not how things ought to be. So I want to know why grep thinks the file is binary and address the issue.

我尝试使用命令查找任何不寻常的字符

I tried to find any characters out of the ordinary using the command

grep -Pna --color -r "[x00-x08]|[x10-x19]|[x80-xFF]" test.cpp

但它不会产生任何匹配.

but it doesn't yield any matches.

如何找出这个问题的原因?

How can figure out the cause of this problem?

我应该提到我在使用 windows git bash.

I should mention I'm on windows git bash.

语言环境输出:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

推荐答案

由于您使用的是 MS Windows，test.cpp 文件可能使用 UTF-16(常见于Windows 的最新版本)或 Windows-1252 (CP-1252) 作为其字符编码(可能是其中一条评论中的印刷引号).

Since you’re using MS Windows, it’s possible that the test.cpp file is encoded using either UTF-16 (common in recent versions of Windows) or Windows-1252 (CP-1252) as its character encoding (perhaps a typographic quote in one of the comments).

当您的语言环境设置为 UTF-8 并且 grep 检测到该语言环境的无效字符时，它假定文件是二进制文件.解决此问题的一个快速方法是，通过在运行 时临时修改 LC_ALL 环境变量，让 grep 使用 C 语言环境grep 命令:

When your locale is set to UTF-8 and grep detects invalid characters for that locale, it assumes that the file is binary. A quick way around this issue, is to get grep to use the C locale by temporarily modifying the LC_ALL environment variable when running the grep command:

LC_ALL=C grep pattern test.cpp

更好的长期解决方案是将文本文件(使用 iconv 或您喜欢的文本编辑器)转换为使用 UTF-8 作为其字符编码.

A better long term solution would be to convert text files (using iconv or your favourite text editor) to use UTF-8 as their character encoding.

这篇关于Grep 认为文本文件是二进制的，但它不是的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Grep 认为文本文件是二进制的，但它不是 [英] Grep thinks text file is binary, but it isn't

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Grep 认为文本文件是二进制的，但它不是 [英] Grep thinks text file is binary, but it isn&#39;t

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Grep 认为文本文件是二进制的，但它不是 [英] Grep thinks text file is binary, but it isn't

登录关闭