如何对所有非 ASCII 字符进行 grep? [英] How do I grep for all non-ASCII characters?

查看:32
本文介绍了如何对所有非 ASCII 字符进行 grep?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个非常大的 XML 文件,我正在尝试查找包含非 ASCII 字符的行.我尝试了以下方法:

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following:

grep -e "[x{00FF}-x{FFFF}]" file.xml

但这会返回文件中的每一行,无论该行是否包含指定范围内的字符.

But this returns every line in the file, regardless of whether the line contains a character in the range specified.

是我语法错误还是我做错了什么?我也试过:

Do I have the syntax wrong or am I doing something else wrong? I've also tried:

egrep "[x{00FF}-x{FFFF}]" file.xml 

(模式周围有单引号和双引号).

(with both single and double quotes surrounding the pattern).

推荐答案

可以使用命令:

grep --color='auto' -P -n "[x80-xFF]" file.xml

这将为您提供行号,并将以红色突出显示非 ascii 字符.

This will give you the line number, and will highlight non-ascii chars in red.

在某些系统中,根据您的设置,上述方法将不起作用,因此您可以反向进行 grep

In some systems, depending on your settings, the above will not work, so you can grep by the inverse

grep --color='auto' -P -n "[^x00-x7F]" file.xml

还要注意,重要的一点是 -P 标志,它等同于 --perl-regexp:所以它会将您的模式解释为 Perl 正则表达式.它还说

Note also, that the important bit is the -P flag which equates to --perl-regexp: so it will interpret your pattern as a Perl regular expression. It also says that

这是高度实验性的,grep -P 可能会警告未实现功能.

this is highly experimental and grep -P may warn of unimplemented features.

这篇关于如何对所有非 ASCII 字符进行 grep?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆