Unicode 搜索不起作用 [英] Unicode search not working

查看:66
本文介绍了Unicode 搜索不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一下.

# -*- coding: utf-8 -*-
data = "cdbsb \xe2\x80\xa6 abc"
print data 
#prints cdbsb … abc
              ^
print re.findall(ur"[\u2026]", data )

为什么 re 找不到这个 unicode 字符?我已经检查过了

Why can't re find this unicode character ? I have already checked

\xe2\x80\xa6 === … === U+2026

推荐答案

如果你通过 nhahtdh 提供的链接

If you go through the link provided by nhahtdh

解决UnicodePython 2.7 中的问题

您可以看到原始字符串以 bytes 为单位,我们正在搜索 unicode.所以它应该永远不会奏效.

You can see the original string was in bytes and we were searching for unicode. So it should never have worked.

encode():让你从 Unicode → 字节

encode(): Gets you from Unicode → bytes

decode():让你从字节 → Unicode

decode(): Gets you from bytes → Unicode

按照这些,我们可以通过两种方式解决它.

Following these we can solve it in 2 ways.

# -*- coding: utf-8 -*-
data = "cdbsb \xe2\x80\xa6 abc".decode("utf-8")  #convert to unicode
print data
print re.findall(ur"[\u2026]", data )
print re.findall(ur"[\u2026]", data )[0].encode("utf-8")  #compare with unicode byte string and then reconvert to bytes for print

data1 = "cdbsb \xe2\x80\xa6 abc"  #let it remain bytes
print data1
print re.findall(r"\xe2\x80\xa6", data1 )[0] #search for bytes

这篇关于Unicode 搜索不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆