匹配所有utf-8/unicode小写字母形式的正则表达式是什么 [英] What is the proper regular expression to match all utf-8/unicode lowercase letter forms

查看：188 发布时间：2020/7/13 3:45:51 python regex unicode utf-8

本文介绍了匹配所有utf-8/unicode小写字母形式的正则表达式是什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想匹配拉丁文本块中的所有所有小写字母形式.平凡的"[a-z]"仅匹配U + 0061和U + 007A之间的字符，而不匹配所有其他小写形式.

我想匹配所有小写字母，最重要的是，匹配EFIGS语言中使用的拉丁语块中所有带重音的小写字母.

[a-zà-ý]是一个开始，但仍然有大量其他小写字符(请参见

lower = ''
for c in xrange(0,2**16): 
  if unichr(c).islower(): 
    lower += unichr(c)

print lower

解决方案

Python当前不支持正则表达式中的Unicode属性.请参阅此答案以获取 Unicode标准的字符属性"一章.或参见此页面，以获取有关在正则表达式中使用Unicode的详细说明.

I would like to match all lowercase letter forms in the Latin block. The trivial '[a-z]' only matches characters between U+0061 and U+007A, and not all the other lowercase forms.

I would like to match all lowercase letters, most importantly, all the accented lowercase letters in the Latin block used in EFIGS languages.

[a-zà-ý] is a start, but there are still tons of other lowercase characters (see http://www.unicode.org/charts/PDF/U0000.pdf). Is there a recommended way of doing this?

FYI I'm using Python, but I suspect that this problem is cross-language.

Python's builtin "islower()" method seems to do the right checking:

lower = ''
for c in xrange(0,2**16): 
  if unichr(c).islower(): 
    lower += unichr(c)

print lower

解决方案

Python does not currently support Unicode properties in regular expressions. See this answer for a link to the Ponyguruma library which does support them.

Using such a library, you could use \p{Ll} to match any lowercase letter in a Unicode string.

Every character in the Unicode standard is in exactly one category. \p{Ll} is the category of lowercase letters, while \p{L} comprises all the characters in one of the "Letter" categories (Letter, uppercase; Letter, lowercase; Letter, titlecase; Letter, modifier; and Letter, other). For more information see the Character Properties chapter of the Unicode Standard. Or see this page for a good explanation on use of Unicode in regular expressions.

这篇关于匹配所有utf-8/unicode小写字母形式的正则表达式是什么的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配所有utf-8/unicode小写字母形式的正则表达式是什么 [英] What is the proper regular expression to match all utf-8/unicode lowercase letter forms

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

匹配所有utf-8/unicode小写字母形式的正则表达式是什么 [英] What is the proper regular expression to match all utf-8/unicode lowercase letter forms

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭