Grep认为文本文件是二进制的，但事实并非如此 [英] Grep thinks text file is binary, but it isn't

查看：747 发布时间：2018/5/28 19:22:41 windows text grep binaryfiles git-bash

本文介绍了Grep认为文本文件是二进制的，但事实并非如此的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在我们的代码库中发现了一个被grep看作二进制文件的 .cpp 文件。所以我不能像文本文件那样格式化它，这很烦人，显然不是事情应该如何。所以我想知道为什么grep认为这个文件是二进制文件并且解决了这个问题。

我试图用命令找到任何不寻常的字符

  grep -Pna --color -r[\x00-\x08] | [\x10-\x19] | [\\ \\ x80-\xFF]test.cpp

但它不会产生任何匹配。 / p>

如何才能找出这个问题的原因？

我应该提到我在windows git bash上。

语言环境输出：

  LANG = en_US.UTF-8 
 LC_CTYPE =en_US.UTF-8
 LC_NUMERIC =en_US.UTF-8
 LC_TIME =en_US.UTF-8
 LC_COLLATE =en_US.UTF- 8
 LC_MONETARY =en_US.UTF-8
 LC_MESSAGES =en_US.UTF-8
 LC_ALL = 
   test.cpp   
解决方案 >文件使用UTF-16（在Windows的最新版本中通用）或Windows-1252（CP-125）进行编码2）作为其字符编码（可能是其中一个注释中的印刷引用）。
 
 当您的语言环境设置为UTF-8并且 grep 检测到该语言环境的无效字符，它假定该文件是二进制文件。解决此问题的一个快速方法是，通过临时修改 grep 来使用 C 语言环境在运行 grep 命令时，c> LC_ALL 环境变量： 
 
 
  LC_ALL = C grep模式test.cpp 
  
更好的长期解决方案是转换文本文件（使用 iconv 或您最喜爱的文本编辑器）以使用UTF-8作为它们的字符编码。
 
I came across a .cpp file in our codebase that is seen as binary by grep. So I can't grep it like a text file, which is annoying and obviously not how things ought to be. So I want to know why grep thinks the file is binary and address the issue.

I tried to find any characters out of the ordinary using the command
grep -Pna --color -r "[\x00-\x08]|[\x10-\x19]|[\x80-\xFF]" test.cpp
but it doesn't yield any matches.

How can figure out the cause of this problem?

I should mention I'm on windows git bash.

Output of locale:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

 解决方案 
Since you’re using MS Windows, it’s possible that the test.cpp file is encoded using either UTF-16 (common in recent versions of Windows) or Windows-1252 (CP-1252) as its character encoding (perhaps a typographic quote in one of the comments).

When your locale is set to UTF-8 and grep detects invalid characters for that locale, it assumes that the file is binary. A quick way around this issue, is to get grep to use the C locale by temporarily modifying the LC_ALL environment variable when running the grep command:
LC_ALL=C grep pattern test.cpp
A better long term solution would be to convert text files (using iconv or your favourite text editor) to use UTF-8 as their character encoding.

                        这篇关于Grep认为文本文件是二进制的，但事实并非如此的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
                        
                    

                    
                        查看全文

Grep认为文本文件是二进制的，但事实并非如此 [英] Grep thinks text file is binary, but it isn't

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Grep认为文本文件是二进制的，但事实并非如此 [英] Grep thinks text file is binary, but it isn&#39;t

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Grep认为文本文件是二进制的，但事实并非如此 [英] Grep thinks text file is binary, but it isn't

登录关闭