如何使python 3打印()utf8 [英] How to make python 3 print() utf8

查看:38
本文介绍了如何使python 3打印()utf8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将 python 3 (3.1) print("Some text") 以 UTF-8 格式输出到标准输出,或者如何输出原始字节?

测试.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ";# 这是 UTF-8TestText2 = b"Test2 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd"# 只是字节打印(sys.getdefaultencoding())打印(sys.stdout.encoding)打印(测试文本)打印(TestText.encode(utf8"))打印(TestText.encode(cp1252",替换"))打印(测试文本2)

输出(在 CP1257 中,我将字符替换为字节值 [x00]):

utf-8cp1257测试 - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]b'测试 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd'b'测试 - ??????..x9ax8a??x9ex8e'b'Test2 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd'

print 太聪明了... :D 使用带有 print 的编码文本是没有意义的(因为它总是只显示字节而不是真实字节的表示),而且它是根本不可能输出字节,因为无论如何都要打印并且总是在 sys.stdout.encoding 中对其进行编码.

例如:print(chr(255)) 抛出错误:

<块引用>

回溯(最近一次调用最后一次):文件Test.py",第 1 行,在 <module> 中.打印(chr(255));文件H:Python31libencodingscp1257.py",第 19 行,在编码中返回 codecs.charmap_encode(input,self.errors,encoding_table)[0]UnicodeEncodeError: 'charmap' 编解码器无法对位置 0 的字符 'xff' 进行编码:字符映射到 <undefined>

顺便说一下,print(TestText == TestText2.decode("utf8")) 返回 False,虽然打印输出是一样的.


Python 3 如何确定 sys.stdout.encoding 以及如何更改它?

我制作了一个 printRAW() 函数,它运行良好(实际上它将输出编码为 UTF-8,所以它实际上不是原始的......):

 def printRAW(*Text):RAWOut = open(1, 'w', encoding='utf8', closefd=False)打印(*文本,文件=RAWOut)RAWOut.flush()RAWOut.close()打印RAW(酷",TestText)

输出(现在以 UTF-8 格式打印):

<块引用>

酷测试 - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) 也可以很好地打印 ü(在 UTF-8 中,[xC3][xBC])并且没有错误:)

现在我正在寻找更好的解决方案,如果有的话......

解决方案

说明:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # 这不是 UTF-8...它是 Python 3.X 中的 Unicode 字符串.TestText2 = TestText.encode('utf8') # 这是一个 UTF-8 编码的字节串.

要将 UTF-8 发送到标准输出而不考虑控制台的编码,请使用其接受字节的缓冲区接口:

导入系统sys.stdout.buffer.write(TestText2)

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00]):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd'
b'Test - ??????..x9ax8a??x9ex8e'
b'Test2 - xc4x81xc4x80xc4x93xc4x92xc4x8dxc4x8c..xc5xa1xc5xa0xc5xabxc5xaaxc5xbexc5xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

For example: print(chr(255)) throws an error:

Traceback (most recent call last):
  File "Test.py", line 1, in <module>
    print(chr(255));
  File "H:Python31libencodingscp1257.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.


How does Python 3 determine sys.stdout.encoding and how can I change it?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

Output (now it print in UTF-8):

Cool Test - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

Now I'm looking for maybe better solution if there's any...

解决方案

Clarification:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:

import sys
sys.stdout.buffer.write(TestText2)

这篇关于如何使python 3打印()utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆