(grep)正则表达式匹配非ASCII字符? [英] (grep) Regex to match non-ASCII characters?

查看:170
本文介绍了(grep)正则表达式匹配非ASCII字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Linux上,我有一个包含大量文件的目录。其中一些具有非ASCII字符,但它们都是有效的 UTF-8 。一个程序有一个错误,可以防止它使用非ASCII文件名,我必须找出有多少人受到影响。我打算用 find 来做到这一点,然后做一个 grep 打印非ASCII字符,然后执行 wc -l <​​/ code>来查找编号。它不一定是grep;我可以使用任何标准的Unix 正则表达式,如 Perl sed AWK 等。



然而, ,是否有'任何不是ASCII字符的字符'的正则表达式?

解决方案

这将匹配单个非ASCII字符:

  [^ \x00-\x7F] 

这是一个有效的 PCRE Perl兼容正则表达式)。



您也可以使用 POSIX shorthands:


  • [[:ascii:] ] - 匹配一首歌le ASCII字符

  • [^ [:ascii:]] - 匹配单个非ASCII字符





[^ [:print:]] 只要你满意。**


On Linux, I have a directory with lots of files. Some of them have non-ASCII characters, but they are all valid UTF-8. One program has a bug that prevents it working with non-ASCII filenames, and I have to find out how many are affected. I was going to do this with find and then do a grep to print the non-ASCII characters, and then do a wc -l to find the number. It doesn't have to be grep; I can use any standard Unix regular expression, like Perl, sed, AWK, etc.

However, is there a regular expression for 'any character that's not an ASCII character'?

解决方案

This will match a single non-ASCII character:

[^\x00-\x7F]

This is a valid PCRE (Perl-Compatible Regular Expression).

You can also use the POSIX shorthands:

  • [[:ascii:]] - matches a single ASCII char
  • [^[:ascii:]] - matches a single non-ASCII char

[^[:print:]] will probably suffice for you.**

这篇关于(grep)正则表达式匹配非ASCII字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆