特殊字符显示为问号 [英] Special characters appearing as question marks
问题描述
使用Python编程语言时,我无法输出å,ä和ö等字符.下面的代码给我一个问号(?)作为输出,而不是å:
Using the Python programming language, I'm having trouble outputting characters such as å, ä and ö. The following code gives me a question mark (?) as output, not an å:
#coding: iso-8859-1
input = "å"
print input
以下代码可让您输入随机文本.for循环遍历输入的每个字符,将它们添加到字符串变量a中,然后输出结果字符串.该代码可以正常工作;您可以输入å,ä和ö,输出仍然正确.例如,år"按预期输出år".
The following code lets you input random text. The for-loop goes through each character of the input, adds them to the string variable a and then outputs the resulting string. This code works correctly; you can input å, ä and ö and the output will still be correct. For example, "år" outputs "år" as expected.
#coding: iso-8859-1
input = raw_input("Test: ")
a = ""
for i in range(0, len(input)):
a = a + input[i]
print a
有趣的是,如果我将 input = raw_input("Test:")
更改为 input =år"
,它将为输出一个问号(?)å".
What's interesting is that if I change input = raw_input("Test: ")
to input = "år"
, it will output a question mark (?) for the "å".
#coding: iso-8859-1
input = "år"
a = ""
for i in range(0, len(input)):
a = a + input[i]
print a
对于它的价值,我正在使用TextWrangler,并且我的文档的字符编码设置为ISO Latin1.是什么原因造成的?我该如何解决这个问题?
For what it's worth, I'm using TextWrangler, and my document's character encoding is set to ISO Latin 1. What causes this? How can I solve the problem?
推荐答案
您正在使用Python 2,我假设它在像Linux这样的平台上运行,该平台使用UTF-8对I/O进行编码.
You're using Python 2, I assume running on a platform like Linux that encodes I/O in UTF-8.
Python 2的"
文字表示字节字符串.因此,当您在ISO 8859-1-编码的源文件中指定år"
时,变量 input
的值为 b'\ xe5r'
.当您进行 print
的打印时,原始字节会输出到控制台,但会显示为问号,因为它们不是有效的UTF-8.
Python 2's ""
literals represent byte-strings. So when you specify "år"
in your ISO 8859-1-encoded source file, the variable input
has the value b'\xe5r'
. When you print
this, the raw bytes are output to the console, but show up as a question-mark because they are not valid UTF-8.
为了演示,请尝试使用 print repr(a)
而不是 print a
.
To demonstrate, try it with print repr(a)
instead of print a
.
当您使用 raw_input()
时,用户输入已经是UTF-8编码的,因此可以正确输出.
When you use raw_input()
, the user's input is already UTF-8-encoded, and so are correctly output.
要解决此问题,请执行以下任一操作:
To fix this, either:
-
在打印字符串之前将其编码为UTF-8:
Encode your string as UTF-8 before printing it:
print a.encode('utf-8')
使用Unicode字符串( u'text'
)代替字节字符串.您需要对输入进行解码时要小心,因为在Python 2上, raw_input()
返回的是字节字符串而不是文本字符串.如果您知道输入是UTF-8,请使用 raw_input().decode('utf-8')
.
Use Unicode strings (u'text'
) instead of byte-strings. You will need to be careful with decoding the input, since on Python 2, raw_input()
returns a byte-string rather than a text string. If you know the input is UTF-8, use raw_input().decode('utf-8')
.
使用UTF-8(而不是iso-8859-1)对源文件进行编码.那么字节字符串文字将已经在UTF-8中.
Encode your source file in UTF-8 instead of iso-8859-1. Then the byte-string literal will already be in UTF-8.
这篇关于特殊字符显示为问号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!