python print() 函数实际上做了什么? [英] What does python print() function actually do?

查看:58
本文介绍了python print() 函数实际上做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在看这个问题 并开始想知道 print 实际上做了什么.

我从来没有发现如何使用 string.decode()string.encode() 在 python 交互式 shell 中获取 unicode 字符串out"与印刷品的格式相同.无论我做什么,我都会得到

  1. UnicodeEncodeError 或
  2. 带有\x##"符号的转义字符串...

这是 python 2.x,但我已经在尝试改正方法并实际调用 print() :)

示例:

<预><代码>>>>导入系统>>>a = '\xAA\xBB\xCC'>>>打印(一)✪»Ì>>>a.encode(sys.stdout.encoding)回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 ?UnicodeDecodeError: 'ascii' 编解码器无法解码位置 0 中的字节 0xaa:序号不在范围内 (128)>>>a.decode(sys.stdout.encoding)你'\xaa\xbb\xcc'

编辑:

我为什么要问这个?我厌倦了 encode() 错误并意识到因为 print 可以做到(至少在交互式 shell 中).我知道必须是一种方法来神奇地正确进行编码,通过从某处挖掘信息使用什么编码......

附加信息:我在 linux2 上运行 Python 2.4.3 (#1, Sep 3 2009, 15:37:12) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]

<预><代码>>>>sys.stdin.encoding'ISO-8859-1'>>>sys.stdout.encoding'ISO-8859-1'

然而,结果与 Python 2.6.2 (r262:71600, Sep 8 2009, 13:06:43) 在同一个 linux 机器上的结果相同.

解决方案

(本次编辑与上一次之间的主要变化... 注意:我使用的是 Python 2.6.4一个 Ubuntu 盒子.)

首先,在我第一次尝试回答时,我提供了一些关于 printstr 的一般信息,为了任何拥有print 更简单的问题并偶然发现这个问题.至于处理 OP 遇到的问题的新尝试......基本上,我倾向于说这里没有灵丹妙药,如果 print 以某种方式设法理解一个奇怪的字符串字面意思,那么这不是可重复的行为.通过在终端窗口中与 Python 的以下有趣交互,我得出了这个结论:

<预><代码>>>>打印 '\xaa\xbb\xcc'

您是否尝试过直接从终端输入 ª»Ì?在使用 utf-8 作为编码的 Linux 终端上,这实际上被读取为六个字节,然后可以在 decode 方法的帮助下使其看起来像三个 unicode 字符:

<预><代码>>>>'✪»Ì''\xc2\xaa\xc2\xbb\xc3\x8c'>>>'ª»Ì'.decode(sys.stdin.encoding)你'\xaa\xbb\xcc'

所以,'\xaa\xbb\xcc' 文字只有在您将其解码作为 latin-1 文字时才有意义(嗯,实际上您可以使用在相关字符上与 latin-1 一致的不同编码).至于 print 'just working' 在你的情况下,它当然不适合我 - 如上所述.

这是因为当您使用不以 u 为前缀的字符串文字时 -- 即 "asdf" 而不是 u"asdf" -- 结果字符串将使用一些非 unicode 编码.不;事实上,字符串对象本身将不知道编码,并且您将不得不将其视为使用编码 x 进行编码,以获得正确的 x 值.这个基本想法使我想到以下几点:

a = '\xAA\xBB\xCC'a.decode('latin1')# 结果:u'\xAA\xBB\xCC'打印(a.decode('latin1'))# 输出: ª»Ì

注意没有解码错误和正确的输出(我希望在任何其他盒子上都保持正确).显然,Python 可以理解您的字符串文字,但并非没有帮助.

这有帮助吗?(至少在理解事物的工作原理方面,如果不是使编码的处理变得更容易......)

<小时>

现在有一些有趣的部分具有一些解释价值(希望如此)!这对我来说很好用:

sys.stdout.write("\xAA\xBB\xCC".decode('latin1').encode(sys.stdout.encoding))

跳过解码或编码部分会导致与 unicode 相关的异常.从理论上讲,这是有道理的,因为需要第一个解码来决定给定字符串中有哪些字符(第一眼看到的唯一明显是 bytes 有什么——Python 3 的想法有(unicode)字符串用于字符和字节,好吧,字节突然看起来非常合理),而需要编码,以便输出尊重输出流的编码.现在这个

sys.stdout.write("ąöî\n".decode(sys.stdin.encoding).encode(sys.stdout.encoding))

也按预期工作,但字符实际上来自键盘,因此实际上是用 stdin 编码...另外,

ord('ą'.decode('utf-8').encode('latin2'))

返回正确的 177(我的输入编码是 utf-8),但是 '\xc4\x85'.encode('latin2') 对 Python 没有意义,因为它不知道如何理解 '\xc4\x85' 并认为尝试 'ascii' 代码是最好的方法.

<小时>

原答案:

Python 文档的相关部分(2.6.4 版)说print(obj) 是为了打印出由 str(obj) 给出的字符串.我想你可以将它包装在对 unicode 的调用中(如在 unicode(str(obj)) 中)以获取 unicode 字符串 - 或者你可以使用Python 3 并将这个特殊的麻烦换成几个不同的.;-)

顺便说一句,这表明您可以操作print对象的结果,就像您可以操作对对象调用str的结果一样,即通过混淆使用 __str__ 方法.示例:

class Foo(object):def __str__(self):返回我是 Foo!"打印 Foo()

至于 print 的实际实现,我认为这根本没有用,但是如果你真的想知道发生了什么......它是在 Python 源代码中的文件 Python/bltinmodule.c 中(我正在查看版本 2.6.4).搜索以 builtin_print 开头的行.这实际上是完全简单的,没有魔法在那里发生.:-)

希望这能回答您的问题...但是如果您确实有我完全遗漏的更神秘的问题,请发表评论,我会再次尝试.另外,我假设我们正在处理 Python 2.x;否则我想我不会有有用的评论.

I was looking at this question and started wondering what does the print actually do.

I have never found out how to use string.decode() and string.encode() to get an unicode string "out" in the python interactive shell in the same format as the print does. No matter what I do, I get either

  1. UnicodeEncodeError or
  2. the escaped string with "\x##" notation...

This is python 2.x, but I'm already trying to mend my ways and actually call print() :)

Example:

>>> import sys
>>> a = '\xAA\xBB\xCC'
>>> print(a)
ª»Ì
>>> a.encode(sys.stdout.encoding)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0: ordinal not in range(128)
>>> a.decode(sys.stdout.encoding)
u'\xaa\xbb\xcc'

EDIT:

Why am I asking this? I am sick and tired of encode() errors and realized that since print can do it (at least in the interactive shell). I know that the MUST BE A WAY to magically do the encoding PROPERLY, by digging the info what encoding to use from somewhere...

ADDITIONAL INFO: I'm running Python 2.4.3 (#1, Sep 3 2009, 15:37:12) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2

>>> sys.stdin.encoding
'ISO-8859-1'
>>> sys.stdout.encoding
'ISO-8859-1'

However, the results are the same with Python 2.6.2 (r262:71600, Sep 8 2009, 13:06:43) on the same linux box.

解决方案

EDIT: (Major changes between this edit and the previous one... Note: I'm using Python 2.6.4 on an Ubuntu box.)

Firstly, in my first attempt at an answer, I provided some general information on print and str which I'm going to leave below for the benefit of anybody having simpler issues with print and chancing upon this question. As for a new attempt at dealing with the issue experienced by the OP... Basically, I'm inclined to say that there's no silver bullet here and if print somehow manages to make sense of a weird string literal, then that's not reproducible behaviour. I'm led to this conclusion by the following funny interaction with Python in my terminal window:

>>> print '\xaa\xbb\xcc'
��

Have you tried to input ª»Ì directly from the terminal? At a Linux terminal using utf-8 as the encoding, this is actually read in as six bytes, which can then be made to look like three unicode chars with the help of the decode method:

>>> 'ª»Ì'
'\xc2\xaa\xc2\xbb\xc3\x8c'
>>> 'ª»Ì'.decode(sys.stdin.encoding)
u'\xaa\xbb\xcc'

So, the '\xaa\xbb\xcc' literal only makes sense if you decode it as a latin-1 literal (well, actually you could use a different encoding which agrees with latin-1 on the relevant characters). As for print 'just working' in your case, it certainly doesn't for me -- as mentioned above.

This is explained by the fact that when you use a string literal not prefixed with a u -- i.e. "asdf" rather than u"asdf" -- the resulting string will use some non-unicode encoding. No; as a matter of fact, the string object itself is going to be encoding-unaware, and you're going to have to treat it as if it was encoded with encoding x, for the correct value of x. This basic idea leads me to the following:

a = '\xAA\xBB\xCC'
a.decode('latin1')
# result: u'\xAA\xBB\xCC'
print(a.decode('latin1'))
# output: ª»Ì

Note the lack of decoding errors and the proper output (which I expect to be stay proper at any other box). Apparently your string literal can be made sense of by Python, but not without some help.

Does this help? (At least in understanding how things work, if not in making the handling of encodings any easier...)


Now for some funny bits with some explanatory value (hopefully)! This works fine for me:

sys.stdout.write("\xAA\xBB\xCC".decode('latin1').encode(sys.stdout.encoding))

Skipping either the decode or the encode part results in a unicode-related exception. Theoretically speaking, this makes sense, as the first decode is needed to decide what characters there are in the given string (the only thing obvious on first sight is what bytes there are -- the Python 3 idea of having (unicode) strings for characters and bytes for, well, bytes, suddenly seems superbly reasonable), while the encode is needed so that the output respects the encoding of the output stream. Now this

sys.stdout.write("ąöî\n".decode(sys.stdin.encoding).encode(sys.stdout.encoding))

also works as expected, but the characters are actually coming from the keyboard and so are actually encoded with the stdin encoding... Also,

ord('ą'.decode('utf-8').encode('latin2'))

returns the correct 177 (my input encoding is utf-8), but '\xc4\x85'.encode('latin2') makes no sense to Python, as it has no clue as to how to make sense of '\xc4\x85' and figures that trying the 'ascii' code is the best it can do.


The original answer:

The relevant bit of Python docs (for version 2.6.4) says that print(obj) is meant to print out the string given by str(obj). I suppose you could then wrap it in a call to unicode (as in unicode(str(obj))) to get a unicode string out -- or you could just use Python 3 and exchange this particular nuisance for a couple of different ones. ;-)

Incidentally, this shows that you can manipulate the result of printing an object just like you can manipulate the result of calling str on an object, that is by messing with the __str__ method. Example:

class Foo(object):
    def __str__(self):
        return "I'm a Foo!"

print Foo()

As for the actual implementation of print, I expect this won't be useful at all, but if you really want to know what's going on... It's in the file Python/bltinmodule.c in the Python sources (I'm looking at version 2.6.4). Search for a line beginning with builtin_print. It's actually entirely straightforward, no magic going on there. :-)

Hopefully this answers your question... But if you do have a more arcane problem which I'm missing entirely, do comment, I'll make a second attempt. Also, I'm assuming we're dealing with Python 2.x; otherwise I guess I wouldn't have a useful comment.

这篇关于python print() 函数实际上做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆