(grep)正则表达式匹配非ASCII字符? [英] (grep) Regex to match non-ASCII characters?
问题描述
在Linux上,我有一个包含大量文件的目录。其中一些具有非ASCII字符,但它们都是有效的 UTF-8 。一个程序有一个错误,可以防止它使用非ASCII文件名,我必须找出有多少人受到影响。我打算用 find
来做到这一点,然后做一个 grep 打印非ASCII字符,然后执行 wc -l </ code>来查找编号。它不一定是grep;我可以使用任何标准的Unix 正则表达式,如 Perl , sed , AWK 等。
然而, ,是否有'任何不是ASCII字符的字符'的正则表达式?
这将匹配单个非ASCII字符:
[^ \x00-\x7F]
这是一个有效的 PCRE ( Perl兼容正则表达式)。
您也可以使用 POSIX shorthands:
[[:ascii:] ]
- 匹配一首歌le ASCII字符
[^ [:ascii:]]
- 匹配单个非ASCII字符
[^ [:print:]]
只要你满意。**
On Linux, I have a directory with lots of files. Some of them have non-ASCII characters, but they are all valid UTF-8. One program has a bug that prevents it working with non-ASCII filenames, and I have to find out how many are affected. I was going to do this with find
and then do a grep to print the non-ASCII characters, and then do a wc -l
to find the number. It doesn't have to be grep; I can use any standard Unix regular expression, like Perl, sed, AWK, etc.
However, is there a regular expression for 'any character that's not an ASCII character'?
This will match a single non-ASCII character:
[^\x00-\x7F]
This is a valid PCRE (Perl-Compatible Regular Expression).
You can also use the POSIX shorthands:
[[:ascii:]]
- matches a single ASCII char[^[:ascii:]]
- matches a single non-ASCII char
[^[:print:]]
will probably suffice for you.**
这篇关于(grep)正则表达式匹配非ASCII字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!