特殊字符显示为问号 [英] Special characters appearing as question marks

查看:204
本文介绍了特殊字符显示为问号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python编程语言时,我无法输出å,ä和ö等字符.下面的代码给我一个问号(?)作为输出,而不是å:

Using the Python programming language, I'm having trouble outputting characters such as å, ä and ö. The following code gives me a question mark (?) as output, not an å:

#coding: iso-8859-1
input = "å"
print input

以下代码可让您输入随机文本.for循环遍历输入的每个字符,将它们添加到字符串变量a中,然后输出结果字符串.该代码可以正常工作;您可以输入å,ä和ö,输出仍然正确.例如,år"按预期输出år".

The following code lets you input random text. The for-loop goes through each character of the input, adds them to the string variable a and then outputs the resulting string. This code works correctly; you can input å, ä and ö and the output will still be correct. For example, "år" outputs "år" as expected.

#coding: iso-8859-1
input = raw_input("Test: ")
a = ""
for i in range(0, len(input)):
    a = a + input[i]
print a

有趣的是,如果我将 input = raw_input("Test:")更改为 input =år" ,它将为输出一个问号(?)å".

What's interesting is that if I change input = raw_input("Test: ") to input = "år", it will output a question mark (?) for the "å".

#coding: iso-8859-1
input = "år"
a = ""
for i in range(0, len(input)):
     a = a + input[i]
print a

对于它的价值,我正在使用TextWrangler,并且我的文档的字符编码设置为ISO Latin1.是什么原因造成的?我该如何解决这个问题?

For what it's worth, I'm using TextWrangler, and my document's character encoding is set to ISO Latin 1. What causes this? How can I solve the problem?

推荐答案

您正在使用Python 2,我假设它在像Linux这样的平台上运行,该平台使用UTF-8对I/O进行编码.

You're using Python 2, I assume running on a platform like Linux that encodes I/O in UTF-8.

Python 2的" 文字表示字节字符串.因此,当您在ISO 8859-1-编码的源文件中指定år" 时,变量 input 的值为 b'\ xe5r'.当您进行 print 的打印时,原始字节会输出到控制台,但会显示为问号,因为它们不是有效的UTF-8.

Python 2's "" literals represent byte-strings. So when you specify "år" in your ISO 8859-1-encoded source file, the variable input has the value b'\xe5r'. When you print this, the raw bytes are output to the console, but show up as a question-mark because they are not valid UTF-8.

为了演示,请尝试使用 print repr(a)而不是 print a .

To demonstrate, try it with print repr(a) instead of print a.

当您使用 raw_input()时,用户输入已经是UTF-8编码的,因此可以正确输出.

When you use raw_input(), the user's input is already UTF-8-encoded, and so are correctly output.

要解决此问题,请执行以下任一操作:

To fix this, either:

  • 在打印字符串之前将其编码为UTF-8:

  • Encode your string as UTF-8 before printing it:

print a.encode('utf-8')

  • 使用Unicode字符串( u'text')代替字节字符串.您需要对输入进行解码时要小心,因为在Python 2上, raw_input()返回的是字节字符串而不是文本字符串.如果您知道输入是UTF-8,请使用 raw_input().decode('utf-8').

  • Use Unicode strings (u'text') instead of byte-strings. You will need to be careful with decoding the input, since on Python 2, raw_input() returns a byte-string rather than a text string. If you know the input is UTF-8, use raw_input().decode('utf-8').

    使用UTF-8(而不是iso-8859-1)对源文件进行编码.那么字节字符串文字将已经在UTF-8中.

    Encode your source file in UTF-8 instead of iso-8859-1. Then the byte-string literal will already be in UTF-8.

    这篇关于特殊字符显示为问号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆