128范围之外的Python字符串 [英] Python strings outside the 128 range

查看:95
本文介绍了128范围之外的Python字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述






任何人都可以解释一下python字符串"é"映射到

二进制代码\ xe9在我的python解释器中?


"é"在7位ASCII表中不存在,这是默认的

编码,对吧?映射é也是如此。 - " \xe9"便携式?

(站点 - )配置依赖?任何人都可以拥有

不同的é什么时候''打印'\ xe9"''执行?如果进程

是依赖于配置的,那么使用什么样的配置信息?


问候,


SB

解决方案

SébastienBoisgéraultchrieb:





任何人都可以解释一下python字符串é如何映射到

二进制代码\ xe9在我的python解释器中?


"é"在7位ASCII表中不存在,这是默认的

编码,对吧?映射é也是如此。 - " \xe9"便携式?

(站点 - )配置依赖?任何人都可以拥有

不同的é什么时候''打印'\ xe9"''执行?如果进程

是依赖于配置的,那么使用什么样的配置信息?



默认编码与此无关。 " \xe9"只是一个字节。

您可以将它写入一个文件(基本上是终端),并且没有

默认编码混合。


当你将unicode(!)字符串

写入文件时,默认编码就会发挥作用。然后使用默认编码使用

将unicode字符串转换为字节字符串。如果默认编码

是ascii(因为它应该是)并且你的unicode字符串包含任何

" funny"字符。


但是即使用latin1或utf-8编码

来明确编码unicode字符串,结果字节字符串也只会被写入

文件。如果终端将解释字节

是否正确,那么这是一个完全不同的问题(实际上并不是你/ python可以控制的b $ b)。


Diez


SébastienBoisgérault写道:


任何人都可以解释一下python字符串是怎么回事" E"映射到

二进制代码\ xe9在我的python解释器?在iso-8859-1字符集中



,字符é由代码

0xE9(十进制233)表示。这里没有映射;字符串中只有一个

字符。它在屏幕上的显示方式取决于你打印它的方式以及你的终端使用的编码方式。

< blockquote class =post_quotes>
>> s ="é"
len(s)



1


>> ord(s)



233


>> hex(ord(s))



''0xe9''


>> s



''\ xe9''


>> print repr(s)



'' \xe9''


>> print s


$ b $bé


>> print chr(233)


$ b $bé


< / F> ;




Fredrik Lundh写道:


in the iso-8859-1字符集,字符é由代码

0xE9(十进制233)表示。这里没有映射;字符串中只有一个

字符。它在屏幕上的显示方式取决于你打印它的方式,以及你的终端使用的编码方式。



晶莹剔透。谢谢!


SB



Hi,

Could anyone explain me how the python string "é" is mapped to
the binary code "\xe9" in my python interpreter ?

"é" is not present in the 7-bit ASCII table that is the default
encoding, right ? So is the mapping "é" -"\xe9" portable ?
(site-)configuration dependent ? Can anyone have something
different of "é" when ''print "\xe9"'' is executed ? If the process
is config-dependent, what kind of config info is used ?

Regards,

SB

解决方案

Sébastien Boisgérault schrieb:

Hi,

Could anyone explain me how the python string "é" is mapped to
the binary code "\xe9" in my python interpreter ?

"é" is not present in the 7-bit ASCII table that is the default
encoding, right ? So is the mapping "é" -"\xe9" portable ?
(site-)configuration dependent ? Can anyone have something
different of "é" when ''print "\xe9"'' is executed ? If the process
is config-dependent, what kind of config info is used ?

The default encoding has nothing to do with this. "\xe9" is just a byte.
You can write it into a file (which the terminal is basically), and no
default encoding whatsoever in the mix.

The default-encoding comes into play when you write unicode(!) strings
to a file. Then the unicode string is converted to a byte string using
the default-eocoding. Which will fail miserably if the default encoding
is ascii (as it is supposed to be) and your unicode string contains any
"funny" characters.

But even if you encode the unicode string explicitely with an encoding
like latin1 or utf-8, the resulting byte strings will just be written to
the file. And it is a totally different question (and actually not
controllable by you/python) if the terminal will interpret the bytes
correct or not.

Diez


Sébastien Boisgérault wrote:

Could anyone explain me how the python string "é" is mapped to
the binary code "\xe9" in my python interpreter ?

in the iso-8859-1 character set, the character é is represented by the code
0xE9 (233 in decimal). there''s no mapping going on here; there''s only one
character in the string. how it appears on your screen depends on how you
print it, and what encoding your terminal is using.

>>s = "é"
len(s)

1

>>ord(s)

233

>>hex(ord(s))

''0xe9''

>>s

''\xe9''

>>print repr(s)

''\xe9''

>>print s

é

>>print chr(233)

é

</F>



Fredrik Lundh wrote:

in the iso-8859-1 character set, the character é is represented by the code
0xE9 (233 in decimal). there''s no mapping going on here; there''s only one
character in the string. how it appears on your screen depends on how you
print it, and what encoding your terminal is using.

Crystal clear. Thanks !

SB


这篇关于128范围之外的Python字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆