匹配python正则表达式中的unicode字符 [英] matching unicode characters in python regular expressions

查看：88 发布时间：2021/6/25 20:17:30 python regex unicode non-ascii-characters character-properties

本文介绍了匹配python正则表达式中的unicode字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经阅读了 Stackoverflow 上的其他问题，但仍然没有更深入的了解.对不起，如果这已经得到回答，但我没有得到任何建议在那里工作.

<预><代码>>>>进口重新>>>m = re.match(r'^/by_tag/(?P\w+)/(?P(\w|[.,!#%{}()@])+)$','/by_tag/xmas/xmas1.jpg')>>>打印 m.groupdict(){'标签':'圣诞节'，'文件名':'xmas1.jpg'}

一切都很好，然后我尝试了一些带有挪威语字符的东西(或者更像 unicode 的东西):

<预><代码>>>>m = re.match(r'^/by_tag/(?P\w+)/(?P(\w|[.,!#%{}()@])+)$','/by_tag/påske/øyfjell.jpg')>>>打印 m.groupdict()回溯(最近一次调用最后一次):文件<交互式输入>"，第 1 行，在 <module> 中.AttributeError: 'NoneType' 对象没有属性 'groupdict'

如何匹配典型的 unicode 字符，例如 øæå?我也希望能够在上面的标签组和文件名的标签组中匹配这些字符.

解决方案

您需要指定 re.UNICODE 标志，和将您的字符串作为 Unicode 字符串输入使用 u 前缀:

<预><代码>>>>re.match(r'^/by_tag/(?P\w+)/(?P(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict(){'标签':u'p\xe5ske'，'文件名':u'\xf8yfjell.jpg'}

这是在 Python 2 中；在 Python 3 中，您必须省略 u，因为所有字符串都是 Unicode.

I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work.

>>> import re
>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/xmas/xmas1.jpg')
>>> print m.groupdict()
{'tag': 'xmas', 'filename': 'xmas1.jpg'}

All is well, then I try something with Norwegian characters in it ( or something more unicode-like ):

>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg')
>>> print m.groupdict()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groupdict'

How can I match typical unicode characters, like øæå? I`d like to be able to match those characters as well, in both the tag-group above and the one for filename.

解决方案

You need to specify the re.UNICODE flag, and input your string as a Unicode string by using the u prefix:

>>> re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict()
{'tag': u'p\xe5ske', 'filename': u'\xf8yfjell.jpg'}

This is in Python 2; in Python 3 you must leave out the u because all strings are Unicode.

这篇关于匹配python正则表达式中的unicode字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配python正则表达式中的unicode字符 [英] matching unicode characters in python regular expressions

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

匹配python正则表达式中的unicode字符 [英] matching unicode characters in python regular expressions

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭