带有utf8问题的python正则表达式 [英] python regular expression with utf8 issue

查看：191 发布时间：2020/7/13 2:45:50 python regex utf-8 python-2.7

本文介绍了带有utf8问题的python正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到了一个包含多行纯utf-8文本的文件.如下所示，顺便说一下，它是中文.

I got a file which includes many lines of plain utf-8 text. Such as below, by the by, it's Chinese.

PROCESS：类型：关爱积分[NOTIFY]   交易号：2012022900000109   订单号：W12022910079166    交易金额：0.01元    交易状态：true 2012-2-29 10:13:08

文件本身以utf-8格式保存.文件名为xx.txt

The file itself was saved in utf-8 format. file name is xx.txt

这是我的python代码，env是python2.7

here is my python code, env is python2.7

#coding: utf-8
import re
pattern = re.compile(r'交易金额：(\d+)元')
for line in open('xx.txt'):
    match = pattern.match(line.decode('utf-8'))
    if match:
        print match.group()

这里有问题的是我没有结果.

The problematic thing here is I got no results.

我想从这里的0.01中获取交易金额：0.01元的十进制字符串.

I wanna get the decimal string from 交易金额：0.01元, in here, which is 0.01.

为什么此代码不起作用?谁能向我解释，我什么也没头绪.

Why doesn't this code work? Can anyone explain it to me, I got no clue whatsoever.

推荐答案

您的代码有几个问题.首先，您应该使用re.compile(ur'<unicode string>').另外，添加re.UNICODE标志也很不错(虽然不确定此处是否确实需要).下一个是您仍然不会收到匹配项，因为\d+不能只处理一系列数字，而应该使用\d+\.?\d+(您需要数字，可能是一个点和一个数字).示例代码:

There are several issues with your code. First you should use re.compile(ur'<unicode string>'). Also it is nice to add re.UNICODE flag (not sure if really needed here though). Next one is that still you will not receive a match since \d+ doesn't handle decimals just a series of numbers, you should use \d+\.?\d+ instead (you want number, probably a dot and a number). Example code:

#coding: utf-8

text = u"PROCESS：类型：关爱积分[NOTIFY]   交易号：2012022900000109   订单号：W12022910079166    交易金额：0.01元    交易状态：true 2012-2-29 10:13:08"
import re
pattern = re.compile(ur'交易金额：(\d+\.?\d+)元', re.UNICODE)

print pattern.search(text).group(1)

这篇关于带有utf8问题的python正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带有utf8问题的python正则表达式 [英] python regular expression with utf8 issue

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带有utf8问题的python正则表达式 [英] python regular expression with utf8 issue

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭