utf - 字符串翻译 [英] utf - string translation

查看:80
本文介绍了utf - 字符串翻译的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我带来了一个在fclpython上发生的帖子。


重点是用语言去除法语口音。


我们注意到len(''à'')!= len(''a'')我发现下面的黑客修复

问题 ......但我不明白 - 特别是因为''à''在扩展的ASCII表中包含了
,因此可以存储在一个字节中。


任何线索?


hg


# - * - 编码:utf-8 - * -

导入字符串


def convert(mot):

print len(mot)

print mot [0]

打印''%x''%ord(mot [1])

table =

string.maketrans(''àa?éèê??? ??ù?''',''\ x00a \ x00a \ x00a \ x00e \ x00e \ x00e \ x00e \ x00i \ x00i \ x00o \ x00o \ x00u \ x00u \ x00u'')


返回mot.translate(表).replace(''\ x00'','''')

c ='' AB?一个''

打印转换(c)

Hi,

I''m bringing over a thread that''s going on on f.c.l.python.

The point was to get rid of french accents from words.

We noticed that len(''à'') != len(''a'') and I found the hack below to fix
the "problem" ... yet I do not understand - especially since ''à'' is
included in the extended ASCII table, and thus can be stored in one byte.

Any clue ?

hg

# -*- coding: utf-8 -*-
import string

def convert(mot):
print len(mot)
print mot[0]
print ''%x'' % ord(mot[1])
table =
string.maketrans(''àa?éèê?????ùü?'',''\x00a\x00a\x00a \x00e\x00e\x00e\x00e\x00i\x00i\x00o\x00o\x00u\x00u \x00u'')

return mot.translate(table).replace(''\x00'','''')
c = ''àb?? a ''
print convert(c)

推荐答案

hg写道:
hg wrote:

我们注意到len(''à'')!= len('''')
We noticed that len(''à'') != len(''a'')



听起来很奇怪。

sounds odd.


>> len(''à'')== len(''a'')
>>len(''à'') == len(''a'')



True


您是否正在使用UTF-8编辑器?


保持你的理智,无论你使用什么编辑器,我建议

在源文件中添加编码指令,并使用* only * Unicode

非ASCII文本的字符串文字。


或换句话说,将它放在文件的顶部(其中utf-8是<无论您的编辑/系统使用什么,都可以获得



# - * - 编码:utf-8 - * -


和使用


你'' < text>''

所有非ASCII文字的



< / F>

True

are you perhaps using an UTF-8 editor?

to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.

or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):

# -*- coding: utf-8 -*-

and use

u''<text>''

for all non-ASCII literals.

</F>


Fredrik Lundh写道:
Fredrik Lundh wrote:

hg写道:
hg wrote:

>我们注意到len(''à'')!= len('''')
>We noticed that len(''à'') != len(''a'')



听起来很奇怪。


sounds odd.


>>> len(''à'')== len(''a'')
>>>len(''à'') == len(''a'')



真的


你是否正在使用UTF-8编辑器?


来保持你的理智,无论你使用什么编辑器,我都建议

在源文件中添加编码指令,并使用* only * Unicode

字符串文字用于非ASCII文本。


或换句话说,把它放在你文件的顶部(其中utf-8;是你的编辑/系统使用的



# - * - 编码:utf-8 - * -


并且对所有非A​​SCII文字使用


u''< text>''





< / F>

True

are you perhaps using an UTF-8 editor?

to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.

or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):

# -*- coding: utf-8 -*-

and use

u''<text>''

for all non-ASCII literals.

</F>






问题是:


# - * - 编码:utf-8 - * -

导入字符串

print len('''')

print len(''à'')


返回1然后2


和string.maketrans(str1 ,str2)要求len(str1)== len(str2)


Hi,

The problem is that:

# -*- coding: utf-8 -*-
import string
print len(''a'')
print len(''à'')

returns 1 then 2

and string.maketrans(str1, str2) requires that len(str1) == len(str2)

hg


hg写道:
hg wrote:

Fredrik Lundh写道:
Fredrik Lundh wrote:

> hg写道:
>hg wrote:

>>我们注意到len(''à'')!= len('''')
>>We noticed that len(''à'') != len(''a'')


听起来很奇怪。

sounds odd.


>>>> len(''à'')== len(''a'')
>>>>len(''à'') == len(''a'')


真的

你是否正在使用UTF-8编辑器?<无论您使用何种编辑器,我都建议您保持理智,我建议在源文件中添加编码指令,并使用* only * Unicode
字符串文字非ASCII文本。

或换句话说,将它放在文件的顶部(其中utf-8;无论您的编辑/系统使用的是什么):

# - * - 编码:utf-8 - * -

并使用

对于所有非ASCII文字,你需要< text>''


< / F>

True

are you perhaps using an UTF-8 editor?

to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.

or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):

# -*- coding: utf-8 -*-

and use

u''<text>''

for all non-ASCII literals.

</F>





问题在于:


# - * - 编码:utf-8 - * -

导入字符串

print len(''a'')

print len(''à'')


返回1然后2


和string.maketrans(str1,str2)要求len(str1)== len(str2)


hg



Hi,

The problem is that:

# -*- coding: utf-8 -*-
import string
print len(''a'')
print len(''à'')

returns 1 then 2

and string.maketrans(str1, str2) requires that len(str1) == len(str2)

hg




PS:我在空闲下运行

PS: I''m running this under Idle


这篇关于utf - 字符串翻译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆