unicode问题 [英] unicode question

查看:92
本文介绍了unicode问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我想知道是否有人可以向我解释一下这里发生了什么:


import sys


#我正在运行Mandrake 1o和Windows XP。

print sys.version

## 2.3。 3(#2,2004年2月17日,11:45:40)[GCC 3.3.2(Mandrake Linux 10.0

3.3.2-6mdk)]

## 2.3 .4(#53,2004年5月25日,21:17:02)[MSC v.1200 32 bit(Intel)]


print" sys.getdefaultencoding =",sys .getdefaultencoding()

#这打印始终是ascii ..


##只是一个班级

班级Y:

def __str __(个体经营):

返回self.c


##定义unicode字符(即字符串)

gamma = u" \ N {GREEK CAPITAL LETTER GAMMA}" ;


y = Y()

yc = gamma


##正常工作:打印希腊资本伽玛Windows上的终端(chcp 437)。

## Mandrake 1o什么都没有打印但至少没有抛出任何例外。

print gamma#(1)


##和以前一样..

打印y .__ str __()#(2)


##编码错误

打印y#(3)????????????

## ascii编码错误..

sys.stdout.write(gamma)#(4)


我特别想知道案例2.我可以看到打印y打电话给

Y .__ str __()。但是Y .__ str __()可以打印吗?那么什么是''打印''确实

在做什么?


感谢您的帮助,

沃尔夫冈。



Hi,

I wonder whether someone could explain me a bit what''s going on here:

import sys

# I''m running Mandrake 1o and Windows XP.
print sys.version

## 2.3.3 (#2, Feb 17 2004, 11:45:40) [GCC 3.3.2 (Mandrake Linux 10.0
3.3.2-6mdk)]
## 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)]

print "sys.getdefaultencoding = ",sys.getdefaultencoding()
# This prints always "ascii" ..

## just a class
class Y:
def __str__(self):
return self.c

## define unicode character (ie. string)
gamma = u"\N{GREEK CAPITAL LETTER GAMMA}"

y = Y()
y.c = gamma

## works fine: prints greek capital gamma on terminal on windows (chcp 437).
## Mandrake 1o nothing gets printed but at least no excecption gets thrown.
print gamma # (1)

## same as before ..
print y.__str__() # (2)

## encoding error
print y # (3) ??????????????

## ascii encoding error ..
sys.stdout.write(gamma) # (4)

I wonder especially about case 2. I can see that "print y" makes a call to
Y.__str__() . But Y.__str__() can be printed?? So what is ''print'' exactly
doing?

Thanks for any help,
Wolfgang.



推荐答案

wolfgang haefelinger写道:
wolfgang haefelinger wrote:
我特别想知道案例2.我可以看到打印y打电话给
Y .__ str __()。但是Y .__ str __()可以打印吗?那么什么是''print''正在做什么?
I wonder especially about case 2. I can see that "print y" makes a call to
Y.__str__() . But Y.__str__() can be printed?? So what is ''print'' exactly
doing?




它看着sys.stdout.encoding。如果设置了这个,并且要打印的东西是
是一个unicode字符串,它会将其转换为流编码,并打印

转换结果。


问候,

Martin



It looks at sys.stdout.encoding. If this is set, and the thing to print
is a unicode string, it converts it to the stream encoding, and prints
the result of the conversion.

Regards,
Martin


Martin v.L ?? wis写道:
Martin v. L??wis wrote:
wolfgang haefelinger写道:
wolfgang haefelinger wrote:
我特别想知道案例2.我可以看到打印y打电话给
Y .__ str __()。但是Y .__ str __()可以打印吗?那么什么是''print''正在做什么呢?

它着眼于sys.stdout.encoding。如果设置了这个,并且要打印的东西是unicode字符串,它会将其转换为流编码,并打印转换结果。
I wonder especially about case 2. I can see that "print y" makes a
call to
Y.__str__() . But Y.__str__() can be printed?? So what is ''print'' exactly
doing?

It looks at sys.stdout.encoding. If this is set, and the thing to print
is a unicode string, it converts it to the stream encoding, and prints
the result of the conversion.



我讨厌与专家发生矛盾,但ISTM认为这是问题,而不是
$ b $
sys.getdefaultencoding()(''ascii'') b sys.stdout.encoding(''cp437'')


gamma转换为cp437就好了:



I hate to contradict an expert, but ISTM that it is
sys.getdefaultencoding() (''ascii'') that is the problem, not
sys.stdout.encoding (''cp437'')

gamma converts to cp437 just fine:

gamma = u" \ N {GREEK CAPITAL LETTER GAMMA}"
sys.stdout.encoding
''cp437''gamma.encode(sys.stdout.encoding)
''\ xe2''打印gamma.encode(sys.stdout.encoding)
??

(打印伽玛)


尝试使用''ascii''编解码器编码gamma不起作用:str(gamma)
Traceback(最近一次调用最后一次):

文件"< stdin> ;",第1行,在?

UnicodeEncodeError:''asc ii''编解码器无法编码字符u''\Γ''

位置0:序数不在范围内(128)


我的猜测是在内部,print继续在其参数上调用str()

,直到它获得一个字符串对象。所以它调用y .__ str __()产生gamma,

然后gamma .__ str __()会引发错误。


如果默认编码设置为cp437然后它工作正常:

import sys
sys.getdefaultencoding()
''cp437''gamma = u" \ N {GREEK CAPITAL LETTER GAMMA}"
str(gamma)
''\ xe2''打印gamma
??

(打印伽玛)

print str(gamma )
gamma = u"\N{GREEK CAPITAL LETTER GAMMA}"
sys.stdout.encoding ''cp437'' gamma.encode(sys.stdout.encoding) ''\xe2'' print gamma.encode(sys.stdout.encoding) ??
(prints a gamma)

Trying to encode gamma using the ''ascii'' codec doesn''t work: str(gamma) Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: ''ascii'' codec can''t encode character u''\u0393'' in
position 0: ordinal not in range(128)

My guess is that internally, print keeps calling str() on its argument
until it gets a string object. So it calls y.__str__() yielding gamma,
then gamma.__str__() which raises the error.

If the default encoding is set to cp437 then it works fine:
import sys
sys.getdefaultencoding() ''cp437'' gamma = u"\N{GREEK CAPITAL LETTER GAMMA}"
str(gamma) ''\xe2'' print gamma ??
(prints a gamma)
print str(gamma)



??

(打印伽玛)


Kent <问候,
Martin


??
(prints a gamma)

Kent

Regards,
Martin



Kent Johnson写道:
Kent Johnson wrote:
Martin v.L? ?wis写道:
Martin v. L??wis wrote:
wolfgang haefelinger写道:
wolfgang haefelinger wrote:
我特别想知道案例2.我可以看到print y打电话给
Y .__ str __()。但是Y .__ str __()可以打印吗?那么什么是''print''
正在做什么?
I wonder especially about case 2. I can see that "print y" makes a
call to
Y.__str__() . But Y.__str__() can be printed?? So what is ''print''
exactly doing?



它着眼于sys.stdout.encoding。如果已设置,并且要打印的东西是unicode字符串,它会将其转换为流编码,并打印转换结果。



It looks at sys.stdout.encoding. If this is set, and the thing to print
is a unicode string, it converts it to the stream encoding, and prints
the result of the conversion.



我讨厌与专家发生矛盾,但ISTM认为这是问题,而不是
sys.stdout.encoding(''cp437'' )


I hate to contradict an expert, but ISTM that it is
sys.getdefaultencoding() (''ascii'') that is the problem, not
sys.stdout.encoding (''cp437'')




我们似乎正在回答问题的不同部分。我回答了

部分什么是''打印''正在做什么;你回答了部分关于

str()转换的问题是什么(虽然我不确定OP是否实际上已经问过这个问题了。) br />

此外,这里有趣的一个案例不在您的实验中:

尝试


print gamma


只要

sys.stdout.encoding支持要打印的字符,这应该可以工作,无论sys.getdefaultencoding()如何。


问候,

马丁



It seems we were answering different parts of the question. I answered
the part "What is ''print'' exactly doing"; you answered the part as to
what the problem with str() conversion is (although I''m not sure whether
the OP has actually asked that question).

Also, the one case that is interesting here was not in your experiment:
try

print gamma

This should work, regardless of sys.getdefaultencoding(), as long as
sys.stdout.encoding supports the characters to be printed.

Regards,
Martin


这篇关于unicode问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆