Python列出了斯堪的纳维亚文字 [英] Python lists with scandinavic letters
问题描述
emacs,unix
输入:
with open(example.txt,r)as f:
for f
打印文件
split = files.split()
打印分割
输出:
Hello world
['Hello','world']
Hellowörld
['Hello','w\xf6rld']
Python正在打印字符串表示,其中包含一个不可打印的字节。不可打印的字节(ASCII范围以外的任何字符或控制字符)显示为转义序列。
重点是您可以复制该表示并将其粘贴到Python代码或解释器,产生完全相同的值。
\xf6
转义码表示一个字节十六进制值F6,当被解释为拉丁文-1字节值时,是ö
字符。
你可能想要将该值解码为Unicode以一致地处理数据。如果您还不知道Unicode是什么,或者想知道编码的其他内容,请参阅:
-
务实Unicode by Ned Batchelder
Can anyone explain what causes this for better understanding of the environment?
emacs, unix
input:
with open("example.txt", "r") as f:
for files in f:
print files
split = files.split()
print split
output:
Hello world
['Hello', 'world']
Hello wörld
['Hello', 'w\xf6rld']
Python is printing the string representation, which includes a non-printable byte. Non-printable bytes (anything outside the ASCII range or a control character) is displayed as an escape sequence.
The point is that you can copy that representation and paste it into Python code or into the interpreter, producing the exact same value.
The \xf6
escape code represents a byte with hex value F6, which when interpreted as a Latin-1 byte value, is the ö
character.
You probably want to decode that value to Unicode to handle the data consistently. If you don't yet know what Unicode really is, or want to know anything else about encodings, see:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
Pragmatic Unicode by Ned Batchelder
这篇关于Python列出了斯堪的纳维亚文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!