（grep）正则表达式匹配非ASCII字符？ [英] (grep) Regex to match non-ASCII characters?

查看：170 发布时间：2018/5/28 19:07:30 regex unicode grep ascii

本文介绍了（grep）正则表达式匹配非ASCII字符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Linux上，我有一个包含大量文件的目录。其中一些具有非ASCII字符，但它们都是有效的 UTF-8 。一个程序有一个错误，可以防止它使用非ASCII文件名，我必须找出有多少人受到影响。我打算用 find 来做到这一点，然后做一个 grep 打印非ASCII字符，然后执行 wc -l </ code>来查找编号。它不一定是grep;我可以使用任何标准的Unix 正则表达式，如 Perl ， sed ， AWK 等。

 
 
 然而， ，是否有'任何不是ASCII字符的字符'的正则表达式？
解决方案
这将匹配单个非ASCII字符：
 
 
  [^ \x00-\x7F] 
  
这是一个有效的 PCRE （  Perl兼容正则表达式）。
 
 
 您也可以使用 POSIX  shorthands： 
 
 
 
   [[：ascii：] ]   - 匹配一首歌le ASCII字符
 
   [^ [：ascii：]]   - 匹配单个非ASCII字符
 
 
 
 
 
 
   [^ [：print：]] 只要你满意。** 
 
On Linux, I have a directory with lots of files. Some of them have non-ASCII characters, but they are all valid UTF-8. One program has a bug that prevents it working with non-ASCII filenames, and I have to find out how many are affected. I was going to do this with find and then do a grep to print the non-ASCII characters, and then do a wc -l to find the number. It doesn't have to be grep; I can use any standard Unix regular expression, like Perl, sed, AWK, etc.

However, is there a regular expression for 'any character that's not an ASCII character'?
 解决方案 
This will match a single non-ASCII character:
[^\x00-\x7F]
This is a valid PCRE (Perl-Compatible Regular Expression).

You can also use the POSIX shorthands:


[[:ascii:]] - matches a single ASCII char
[^[:ascii:]] - matches a single non-ASCII char




[^[:print:]] will probably suffice for you.**

                        这篇关于（grep）正则表达式匹配非ASCII字符？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

（grep）正则表达式匹配非ASCII字符？ [英] (grep) Regex to match non-ASCII characters?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

（grep）正则表达式匹配非ASCII字符？ [英] (grep) Regex to match non-ASCII characters?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭