我如何grep所有非ASCII字符？ [英] How do I grep for all non-ASCII characters?

查看：179 发布时间：2018/5/28 19:06:37 regex unix unicode grep

本文介绍了我如何grep所有非ASCII字符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有几个非常大的XML文件，我试图找到包含非ASCII字符的行。我已经尝试了以下内容：

  grep -e[\x {00FF} -\x {FFFF}] file.xml

但是这将返回文件中的每一行，而不管该行是否包含字符在指定的范围内。

我的语法错了吗？还是我在做其他错误？我也试过：

  egrep[\x {00FF} -\x {FFFF}]文件。 xml

（模式周围有单引号和双引号）。
您可以使用以下命令：

  grep  - >  color ='auto'-P -n[\x80-\xFF]file.xml

这会给你行号，并且会以红色突出显示非ascii字符。

在某些系统中，根据您的设置，上述操作不起作用，所以你可以通过反转grep

  grep --color ='auto'-P -n[^ \x00- \x7F]file.xml

请注意，重要的是 -P 标志，它等于 - perl-regexp ：所以它会将你的模式解释为一个Perl正则表达式。它还表示，

这是高度实验性的，grep -P可能会警告未实现的
功能。

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following:

grep -e "[\x{00FF}-\x{FFFF}]" file.xml

But this returns every line in the file, regardless of whether the line contains a character in the range specified.

Do I have the syntax wrong or am I doing something else wrong? I've also tried:
egrep "[\x{00FF}-\x{FFFF}]" file.xml
(with both single and double quotes surrounding the pattern).
解决方案
You can use the command:
grep --color='auto' -P -n "[\x80-\xFF]" file.xml
This will give you the line number, and will highlight non-ascii chars in red.

In some systems, depending on your settings, the above will not work, so you can grep by the inverse
grep --color='auto' -P -n "[^\x00-\x7F]" file.xml
Note also, that the important bit is the -P flag which equates to --perl-regexp: so it will interpret your pattern as a Perl regular expression. It also says that

this is highly experimental and grep -P may warn of unimplemented features.

这篇关于我如何grep所有非ASCII字符？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何grep所有非ASCII字符？ [英] How do I grep for all non-ASCII characters?

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

我如何grep所有非ASCII字符？ [英] How do I grep for all non-ASCII characters?

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭