utf - 字符串翻译 [英] utf - string translation

查看：80 发布时间：2019/6/5 1:26:50 python

本文介绍了utf - 字符串翻译的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我带来了一个在fclpython上发生的帖子。

重点是用语言去除法语口音。

我们注意到len（''à''）！= len（''a''）我发现下面的黑客修复

问题 ......但我不明白 - 特别是因为''à''在扩展的ASCII表中包含了
，因此可以存储在一个字节中。

任何线索？

hg

＃ - * - 编码：utf-8 - * -

导入字符串

def convert（mot）：

print len（mot）

print mot [0]

打印''％x''％ord（mot [1]）

table =

string.maketrans（''àa？éèê??? ??ù？'''，''\ x00a \ x00a \ x00a \ x00e \ x00e \ x00e \ x00e \ x00i \ x00i \ x00o \ x00o \ x00u \ x00u \ x00u''）

返回mot.translate（表）.replace（''\ x00''，''''）

c ='' AB？一个''

打印转换（c）

Hi,

I''m bringing over a thread that''s going on on f.c.l.python.

The point was to get rid of french accents from words.

We noticed that len(''à'') != len(''a'') and I found the hack below to fix
the "problem" ... yet I do not understand - especially since ''à'' is
included in the extended ASCII table, and thus can be stored in one byte.

Any clue ?

hg

# -*- coding: utf-8 -*-
import string

def convert(mot):
print len(mot)
print mot[0]
print ''%x'' % ord(mot[1])
table =
string.maketrans(''àa?éèê?????ùü?'',''\x00a\x00a\x00a \x00e\x00e\x00e\x00e\x00i\x00i\x00o\x00o\x00u\x00u \x00u'')

return mot.translate(table).replace(''\x00'','''')
c = ''àb?? a ''
print convert(c)

推荐答案

hg写道：

hg wrote:

我们注意到len（''à''）！= len（''''）

We noticed that len(''à'') != len(''a'')

听起来很奇怪。

sounds odd.

>> len（''à''）== len（''a''）

>>len(''à'') == len(''a'')

True

您是否正在使用UTF-8编辑器？

保持你的理智，无论你使用什么编辑器，我建议

在源文件中添加编码指令，并使用* only * Unicode

非ASCII文本的字符串文字。

或换句话说，将它放在文件的顶部（其中utf-8是<无论您的编辑/系统使用什么，都可以获得
：

＃ - * - 编码：utf-8 - * -

和使用

你'' < text>''

所有非ASCII文字的
。

< / F>

True

are you perhaps using an UTF-8 editor?

to keep your sanity, no matter what editor you''re using, I recommend
adding a coding directive to the source file, and using *only* Unicode
string literals for non-ASCII text.

or in other words, put this at the top of your file (where "utf-8" is
whatever your editor/system is using):

# -*- coding: utf-8 -*-

and use

u''<text>''

for all non-ASCII literals.

</F>

Fredrik Lundh写道：

Fredrik Lundh wrote:

hg写道：

hg wrote:

>我们注意到len（''à''）！= len（''''）

>We noticed that len(''à'') != len(''a'')

听起来很奇怪。

sounds odd.

>>> len（''à''）== len（''a''）

>>>len(''à'') == len(''a'')

真的

你是否正在使用UTF-8编辑器？

来保持你的理智，无论你使用什么编辑器，我都建议

在源文件中添加编码指令，并使用* only * Unicode

字符串文字用于非ASCII文本。

或换句话说，把它放在你文件的顶部（其中utf-8;是你的编辑/系统使用的
：

＃ - * - 编码：utf-8 - * -

并且对所有非ASCII文字使用

u''< text>''

。

< / F>

问题是：

＃ - * - 编码：utf-8 - * -

导入字符串

print len（''''）

print len（''à''）

返回1然后2

和string.maketrans（str1 ，str2）要求len（str1）== len（str2）

Hi,

The problem is that:

# -*- coding: utf-8 -*-
import string
print len(''a'')
print len(''à'')

returns 1 then 2

and string.maketrans(str1, str2) requires that len(str1) == len(str2)

hg

hg写道：

hg wrote:

Fredrik Lundh写道：

Fredrik Lundh wrote:

> hg写道：

>hg wrote:

>>我们注意到len（''à''）！= len（''''）

>>We noticed that len(''à'') != len(''a'')

听起来很奇怪。

sounds odd.

>>>> len（''à''）== len（''a''）

>>>>len(''à'') == len(''a'')

真的

你是否正在使用UTF-8编辑器？<无论您使用何种编辑器，我都建议您保持理智，我建议在源文件中添加编码指令，并使用* only * Unicode
字符串文字非ASCII文本。

或换句话说，将它放在文件的顶部（其中utf-8;无论您的编辑/系统使用的是什么）：

＃ - * - 编码：utf-8 - * -

并使用

对于所有非ASCII文字，你需要< text>''

< / F>

问题在于：

＃ - * - 编码：utf-8 - * -

导入字符串

print len（''a''）

print len（''à''）

返回1然后2

和string.maketrans（str1，str2）要求len（str1）== len（str2）

hg

Hi,

The problem is that:

# -*- coding: utf-8 -*-
import string
print len(''a'')
print len(''à'')

returns 1 then 2

and string.maketrans(str1, str2) requires that len(str1) == len(str2)

hg

PS：我在空闲下运行

PS: I''m running this under Idle

这篇关于utf - 字符串翻译的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

utf - 字符串翻译 [英] utf - string translation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

utf - 字符串翻译 [英] utf - string translation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭