utf - 字符串翻译 [英] utf - string translation
问题描述
我带来了一个在fclpython上发生的帖子。
重点是用语言去除法语口音。
我们注意到len(''à'')!= len(''a'')我发现下面的黑客修复
问题 ......但我不明白 - 特别是因为''à''在扩展的ASCII表中包含了
,因此可以存储在一个字节中。
任何线索?
hg
# - * - 编码:utf-8 - * -
导入字符串
def convert(mot):
print len(mot)
print mot [0]
打印''%x''%ord(mot [1])
table =
string.maketrans(''àa?éèê??? ??ù?''',''\ x00a \ x00a \ x00a \ x00e \ x00e \ x00e \ x00e \ x00i \ x00i \ x00o \ x00o \ x00u \ x00u \ x00u'')
返回mot.translate(表).replace(''\ x00'','''')
c ='' AB?一个''
打印转换(c)
Hi,
I''m bringing over a thread that''s going on on f.c.l.python.
The point was to get rid of french accents from words.
We noticed that len(''à'') != len(''a'') and I found the hack below to fix
the "problem" ... yet I do not understand - especially since ''à'' is
included in the extended ASCII table, and thus can be stored in one byte.
Any clue ?
hg
# -*- coding: utf-8 -*-
import string
def convert(mot):
print len(mot)
print mot[0]
print ''%x'' % ord(mot[1])
table =
string.maketrans(''àa?éèê?????ùü?'',''\x00a\x00a\x00a \x00e\x00e\x00e\x00e\x00i\x00i\x00o\x00o\x00u\x00u \x00u'')
return mot.translate(table).replace(''\x00'','''')
c = ''àb?? a ''
print convert(c)
推荐答案
hg写道:
hg wrote:
我们注意到len(''à'')!= len('''')
We noticed that len(''à'') != len(''a'')
听起来很奇怪。
sounds odd.
>> len(''à'')== len(''a'')
>>len(''à'') == len(''a'')
True
您是否正在使用UTF-8编辑器?
保持你的理智,无论你使用什么编辑器,我建议
在源文件中添加编码指令,并使用* only * Unicode
非ASCII文本的字符串文字。
或换句话说,将它放在文件的顶部(其中utf-8是<无论您的编辑/系统使用什么,都可以获得
:
# - * - 编码:utf-8 - * -
和使用
你'' < text>''
所有非ASCII文字的
。
< / F>
True
are you perhaps using an UTF-8 editor?
to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.
or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):
# -*- coding: utf-8 -*-
and use
u''<text>''
for all non-ASCII literals.
</F>
Fredrik Lundh写道:
Fredrik Lundh wrote:
hg写道:
hg wrote:
>我们注意到len(''à'')!= len('''')
>We noticed that len(''à'') != len(''a'')
听起来很奇怪。
sounds odd.
>>> len(''à'')== len(''a'')
>>>len(''à'') == len(''a'')
真的
你是否正在使用UTF-8编辑器?
来保持你的理智,无论你使用什么编辑器,我都建议
在源文件中添加编码指令,并使用* only * Unicode
字符串文字用于非ASCII文本。
或换句话说,把它放在你文件的顶部(其中utf-8;是你的编辑/系统使用的
:
# - * - 编码:utf-8 - * -
并且对所有非ASCII文字使用
u''< text>''
。
< / F>
True
are you perhaps using an UTF-8 editor?
to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.
or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):
# -*- coding: utf-8 -*-
and use
u''<text>''
for all non-ASCII literals.
</F>
问题是:
# - * - 编码:utf-8 - * -
导入字符串
print len('''')
print len(''à'')
返回1然后2
和string.maketrans(str1 ,str2)要求len(str1)== len(str2)
Hi,
The problem is that:
# -*- coding: utf-8 -*-
import string
print len(''a'')
print len(''à'')
returns 1 then 2
and string.maketrans(str1, str2) requires that len(str1) == len(str2)
hg
hg写道:
hg wrote:
Fredrik Lundh写道:
Fredrik Lundh wrote:
> hg写道:
>hg wrote:
>>我们注意到len(''à'')!= len('''')
>>We noticed that len(''à'') != len(''a'')
听起来很奇怪。
sounds odd.
>>>> len(''à'')== len(''a'')
>>>>len(''à'') == len(''a'')
真的
你是否正在使用UTF-8编辑器?<无论您使用何种编辑器,我都建议您保持理智,我建议在源文件中添加编码指令,并使用* only * Unicode
字符串文字非ASCII文本。
或换句话说,将它放在文件的顶部(其中utf-8;无论您的编辑/系统使用的是什么):
# - * - 编码:utf-8 - * -
并使用
对于所有非ASCII文字,你需要< text>''
< / F>
True
are you perhaps using an UTF-8 editor?
to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.
or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):
# -*- coding: utf-8 -*-
and use
u''<text>''
for all non-ASCII literals.
</F>
>
问题在于:
# - * - 编码:utf-8 - * -
导入字符串
print len(''a'')
print len(''à'')
返回1然后2
和string.maketrans(str1,str2)要求len(str1)== len(str2)
hg
Hi,
The problem is that:
# -*- coding: utf-8 -*-
import string
print len(''a'')
print len(''à'')
returns 1 then 2
and string.maketrans(str1, str2) requires that len(str1) == len(str2)
hg
PS:我在空闲下运行
PS: I''m running this under Idle
这篇关于utf - 字符串翻译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!