用Python将Unicode字符串压缩在一起 [英] Zipping together unicode strings in Python

查看:61
本文介绍了用Python将Unicode字符串压缩在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串:

a = "ÀÁÂÃÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ" b = "àáâãäèéçêëìíîïòóôõöùúûüÿ"

我想创建字符串

"ÀàÁáÂâ..."

即将字符串分成两部分,然后将两半压缩在一起.

I.e split the string in two and then zip the halves together.

我尝试过幼稚的zip(a, b),但这没有用.我认为这是由于Unicode问题所致.

I tried the naive zip(a, b) but this didn't work. I think this is due to a problem with unicode.

有人知道我如何获得想要的结果吗?

Does anyone know how I can get the result I want?

推荐答案

在Python 2.x中,默认情况下字符串不是unicode.处理unicode数据时,您必须执行以下操作:

In Python 2.x, strings are not unicode by default. When dealing with unicode data, you have to do the following:

  • 带有u 字符的前缀字符串文字:a = u'ÀÁÂÃÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ'

如果要避免使用u前缀,并且要使用的模块具有足够的兼容性,请使用from __future__ import unicode_literals import 将字符串文字默认情况下解释为unicode

if you want to avoid the u prefix and if the modules you are working with are enough compatible, use from __future__ import unicode_literals import to make string literals interpreted as unicode by default

如果您直接在Python代码中编写unicode字符串文字,请将您的.py文件保存为utf-8格式,以便正确解释这些文字. Python 2.3及更高版本将解释 utf-8 BOM ;好的做法还可以在网址添加特定的注释行.文件开头以指示类似# -*- coding: utf-8 -*-

if you write unicode string literals directly in your Python code, save your .py file in utf-8 format so that the literals are correctly interpreted. Python 2.3+ will interpret the utf-8 BOM ; a good practice is also to add a specific comment line at the beginning of the file to indicate the encoding like # -*- coding: utf-8 -*-, or

您还可以继续将.py文件保存在ascii中,但是您将需要在文字中转义unicode字符,这可能会导致可读性降低:'ÀÁÂÃ'应该成为'\xc0\xc1\xc2\xc3'

you can also keep saving the .py file in ascii, but you will need to escape the unicode characters in the literals, which can be less readable: 'ÀÁÂÃ' should become '\xc0\xc1\xc2\xc3'

一旦满足这些条件,剩下的就是将算法应用于那些unicode字符串上,就像使用str版本一样.这是解决__future__导入问题的一种可能的解决方案:

Once you fulfill those conditions, the rest is about applying algorithms on those unicode strings the same way you would work with the str version. Here is one possible solution for your problem with the __future__ import:

from __future__ import unicode_literals

from itertools import chain
a = "ÀÁÂÃÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ"
b = "àáâãäèéçêëìíîïòóôõöùúûüÿ"

print ''.join(chain(*zip(a,b)))

>>> ÀàÁáÂâÃãÈäÉèÊéËçÌêÍëÎìÏíÒîÓïÔòÕóÖôÙõÚöÛùÜú

更多参考文献:

  • PEP 263 defines the non-ascii encoding comments
  • PEP 3120 defines utf-8 as the default encoding in Python 3

这篇关于用Python将Unicode字符串压缩在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆