python中的中文和日文字符支持 [英] Chinese and Japanese character support in python

查看:184
本文介绍了python中的中文和日文字符支持的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何正确阅读日文和中文字符.我正在使用 python 2.5.输出显示为 "E:\Test\?????????"

How to read correctly japanese and chinese characters. I'm using python 2.5. Output is displayed as "E:\Test\?????????"

path = r"E:\Test\は最高のプログラマ"
t = path.encode()
print t
u = path.decode()
print u
t = path.encode("utf-8")
print t
t = path.decode("utf-8")
print t

推荐答案

请阅读 Python Unicode 操作指南;它解释了如何在 Python 代码中处理和包含非 ASCII 文本.

Please do read the Python Unicode HOWTO; it explains how to process and include non-ASCII text in your Python code.

如果您想在代码中包含日语文本文字,您有多种选择:

If you want to include Japanese text literals in your code, you have several options:

  • 使用 unicode 文字(创建 unicode 对象而不是字节字符串),但任何非 ascii 代码点都由 unicode 转义字符表示.它们采用 \uabcd 的形式,所以一个反斜杠、一个 u 和 4 个十六进制数字:

  • Use unicode literals (create unicode objects instead of byte strings), but any non-ascii codepoint is represented by a unicode escape character. They take the form of \uabcd, so a backslash, a u and 4 hexadecimal digits:

ru = u'\u30EB'

将是一个字符,片假名ru"代码点(ル").

would be one character, the katakana 'ru' codepoint ('ル').

使用 unicode 文字,但包括以某种编码形式的字符.您的文本编辑器将以给定的编码(例如 UTF-16)保存文件;您需要在源文件的顶部声明该编码:

Use unicode literals, but include the characters in some form of encoding. Your text editor will save files in a given encoding (say, UTF-16); you need to declare that encoding at the top of the source file:

# encoding: utf-16

ru = u'ル'

在不使用转义的情况下包含ル".Python 2 文件的默认编码是 ASCII,因此通过声明编码,您可以直接使用日语.

where 'ル' is included without using an escape. The default encoding for Python 2 files is ASCII, so by declaring an encoding you make it possible to use Japanese directly.

使用字节字符串文字,准备好编码.通过其他方式对代码点进行编码,并将它们包含在您的字节字符串文字中.如果您打算对它们做的只是以编码形式使用它们,那应该没问题:

Use byte string literals, ready encoded. Encode the codepoints by some other means and include them in your byte string literals. If all you are going to do with them is use them in encoded form anyway, this should be fine:

ru = '\xeb\x30'  # ru encoded to UTF16 little-endian

我将ル"编码为 UTF-16 little-endian,因为这是默认的 Windows NTFS 文件名编码.

I encoded 'ル' to UTF-16 little-endian because that's the default Windows NTFS filename encoding.

下一个问题将是您的终端,Windows 控制台因不支持许多开箱即用的字符集而臭名昭著.您可能希望将其配置为处理 UTF-8.有关详细信息,请参阅这个问题,但您需要运行控制台中的以下命令:

Next problem will be your terminal, the Windows console is notorious for not supporting many character sets out of the box. You probably want to configure it to handle UTF-8 instead. See this question for some details, but you need to run the following command in the console:

chcp 65001

要切换到 UTF-8,您可能需要切换到可以处理代码点的控制台字体(也许是 Lucida?).

to switch to UTF-8, and you may need to switch to a console font that can handle your codepoints (Lucida perhaps?).

这篇关于python中的中文和日文字符支持的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆