关于aPython程序中的汉字的问题 [英] a question about Chinese characters in aPython Program

查看:50
本文介绍了关于aPython程序中的汉字的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望你们周末愉快。


我有一个问题,我希望有人可以帮助我。我想运行一个使用Tkinter作为用户界面(GUI)的Python程序。该程序允许我输入中文字符,但neverthelss无法在屏幕上显示它们。以下是我在注销程序后收到的一些错误消息:


无法写入输出:< type异常:UnicodeEncodeError''>,' 'ascii''编解码器无法编码位置0-1的字符:序数不在范围内(128)"


任何建议都将受到赞赏。


此致,


Liang

Liang Chen,博士

助理教授

佐治亚大学

通信科学与特殊教育

542 Aderhold Hall

雅典,GA 30602

电话:706-542-4566

解决方案

10月20日上午10:48 * am,梁晨< c .. 。@ uga.eduwrote:


希望你们周末愉快。


我有一个问题,我希望有人可以帮助我。我想运行一个使用Tkinter作为用户界面(GUI)的Python程序。该程序允许我输入中文字符,但neverthelss无法在屏幕上显示它们。以下是我在注销程序后收到的一些错误消息:


无法写入输出:< type异常:UnicodeEncodeError''>,' 'ascii''codec不能编码位置0-1的字符:序数不在范围内(128)"


任何建议都将受到赞赏。


此致,


Liang


Liang Chen,博士

助理教授

佐治亚大学

通信科学与特殊教育

542 Aderhold Hall

雅典,GA 30602

电话:706-542-4566



我个人称之为python中的一个严重错误,但遗憾的是大多数python

社区成员不同意

..这可能是导致此问题的内部str()。

https://groups.google.com/gro up / comp ... 6ade6b6f5f3052
http://bugs.python .org / issue3648


On 20 Okt,07:32,est< electronix ... @ gmail.comwrote:


>

我个人认为它是python中的一个严重错误



通常我我觉得Python中存在bug的可能性,但你的

推理有点薄(在 http://bugs.python.org/issue3648): 为什么

不能将Python定义为ascii到范围(256) ;


我接受将文本输出到控制台可能很尴尬,对于

示例,但你必须考虑控制台可能不是

配置为显示你可以抛出的任何角色。我的控制台是

配置为ISO-8859-15(类似你的魔法ascii到

范围(256)只有在有人必须决定256字符

实际上是),但这不会帮助我显示CJK字符。

解决方案可能是生成UTF-8然后让用户访问显示

在适当配置的应用程序中的输出,但即使这样,有人必须说它是UTF-8而不是其他编码' br />
正在使用。正如在另一个最近的一个帖子中所讨论的那样,Python 2.x确实会对这些问题作出一些合理的猜测,以便

它可以自动生成(没有魔法知识) 。


使用str的问题也存在问题。内置函数或

任何可以将某些Unicode对象转换为普通

字符串的操作。现在建议您只需要生成一个字节序列(对于输出,对于

示例),只转换为纯字符串

,并指明如何Unicode值编码为

字节(通过指定编码)。 Python 3.x并没有真正改变

这个:它只是使Unicode /文本与字节的区别更加明显。


Paul


10月20日,6:47 * pm,Paul Boddie< p ... @ boddie.org.ukwrote:
< blockquote class =post_quotes>
On 20 Okt,07:32,est< electronix ... @ gmail.comwrote:


我个人打电话它是python中的一个严重错误



通常我会接受Python中bug的可能性,但你的

推理有点薄(inhttp://bugs.python.org/issue3648):为什么

不能用Python定义ascii到范围(256)


我接受将输出文本输出到控制台可能很尴尬,对于

示例,但你必须考虑控制台可能不是

配置为显示你可以抛出任何角色。我的控制台是

配置为ISO-8859-15(类似你的魔法ascii到

范围(256)只有在有人必须决定256字符

实际上是),但这不会帮助我显示CJK字符。

解决方案可能是生成UTF-8然后让用户访问显示

在适当配置的应用程序中的输出,但即使这样,有人必须说它是UTF-8而不是其他编码' br />
正在使用。正如在另一个最近的一个帖子中所讨论的那样,Python 2.x确实会对这些问题作出一些合理的猜测,以便

它可以自动生成(没有魔法知识) 。


使用str的问题也存在问题。内置函数或

任何可以将某些Unicode对象转换为普通

字符串的操作。现在建议您只需要生成一个字节序列(对于输出,对于

示例),只转换为纯字符串

,并指明如何Unicode值编码为

字节(通过指定编码)。 Python 3.x并没有真正改变

这个:它只是使Unicode /文本与字节的区别更加明显。


Paul



感谢Paul的长篇评论,但它并没有帮助
Python编码中的大量错误。 br />

恕我直言,输出错误的编码更好,而不是停止

整个该死的程序除外


调试编码问题时,解决方案很简单。如果

字符显示错误,切换到另一种编码,其中一个必须

是正确的。


但它是'厌倦了在python中处理编码,你必须用尝试来包装

每个单个字符的表达......除了......想象一下

它是多么痛苦。 />

就像我在Google网上论坛中提供的示例一样,u'' \\\ _86'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 。没有机会,因为python REFUSE要处理一个大于范围(128)的字节。


奇怪的是''mbcs''编码系统可以。 'mbcs''有魔法还是

?但它是Windows特定的


处理字符编码非常简单。 AFAIK早期在Unicode之前编码
,尽管它们有很多名字,都是基于黑客攻击的b $ b。以汉字为例。它们被称为

GB2312编码,实际上它完全兼容范围(256)
ANSI。 (有一些小问题,比如显示一半的宽字符在

a问号?但至少它是可读的)如果你只是输出

连续字节数组,它是GB2312。对于BIG5,JIS,

等也是如此。

就像我说的那样,str()不应该抛出设计异常,它是

基本语言标准。 str()不仅是转换为字符串

函数,而且在大多数情况下也是序列化。(例如socket)我的

简单的建议是:如果它是's unicode字符,输出为UTF-8;

其他明智的只是输出字节数组,请不要用真正的编码来实现

愚蠢范围(128)ASCII。这不是猜测,这是完全错误的。


Hope you all had a nice weekend.

I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type Chinese characters, but neverthelss is unable to show them up on screen. The follow is some of the error message I received after I logged off the program:

"Could not write output: <type "exceptions: UnicodeEncodeError''>, ''ascii'' codec can''t encode characters in position 0-1: ordinal not in range (128)"

Any suggestion will be appreciated.

Sincerely,

Liang
Liang Chen,Ph.D.
Assistant Professor
University of Georgia
Communication Sciences and Special Education
542 Aderhold Hall
Athens, GA 30602

Phone: 706-542-4566

解决方案

On Oct 20, 10:48*am, Liang Chen <c...@uga.eduwrote:

Hope you all had a nice weekend.

I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type Chinese characters, but neverthelss is unable to show them up on screen. The follow is some of the error message I received after I logged off the program:

"Could not write output: <type "exceptions: UnicodeEncodeError''>, ''ascii''codec can''t encode characters in position 0-1: ordinal not in range (128)"

Any suggestion will be appreciated.

Sincerely,

Liang

Liang Chen,Ph.D.
Assistant Professor
University of Georgia
Communication Sciences and Special Education
542 Aderhold Hall
Athens, GA 30602

Phone: 706-542-4566

Personally I call it a serious bug in python, but sadly most of python
community members do not agree
.. It may be a internal str() that caused this issue.

https://groups.google.com/group/comp...6ade6b6f5f3052
http://bugs.python.org/issue3648


On 20 Okt, 07:32, est <electronix...@gmail.comwrote:

>
Personally I call it a serious bug in python

Normally I''d entertain the possibility of bugs in Python, but your
reasoning is a bit thin (in http://bugs.python.org/issue3648): "Why
cann''t Python just define ascii to range(256)"

I do accept that it can be awkward to output text to the console, for
example, but you have to consider that the console might not be
configured to display any character you can throw at it. My console is
configured for ISO-8859-15 (something like your magical "ascii to
range(256)" only where someone has to decide what those 256 characters
actually are), but that isn''t going to help me display CJK characters.
A solution might be to generate UTF-8 and then get the user to display
the output in an appropriately configured application, but even then
someone has to say that it''s UTF-8 and not some other encoding that''s
being used. As discussed in another recent thread, Python 2.x does
make some reasonable guesses about such matters to the extent that
it''s possible automatically (without magical knowledge).

There is also the problem about use of the "str" built-in function or
any operation where some Unicode object may be converted to a plain
string. It is now recommended that you only convert to plain strings
when you need to produce a sequence of bytes (for output, for
example), and that you indicate how the Unicode values are encoded as
bytes (by specifying an encoding). Python 3.x doesn''t really change
this: it just makes the Unicode/text vs. bytes distinction more
obvious.

Paul


On Oct 20, 6:47*pm, Paul Boddie <p...@boddie.org.ukwrote:

On 20 Okt, 07:32, est <electronix...@gmail.comwrote:

Personally I call it a serious bug in python


Normally I''d entertain the possibility of bugs in Python, but your
reasoning is a bit thin (inhttp://bugs.python.org/issue3648):"Why
cann''t Python just define ascii to range(256)"

I do accept that it can be awkward to output text to the console, for
example, but you have to consider that the console might not be
configured to display any character you can throw at it. My console is
configured for ISO-8859-15 (something like your magical "ascii to
range(256)" only where someone has to decide what those 256 characters
actually are), but that isn''t going to help me display CJK characters.
A solution might be to generate UTF-8 and then get the user to display
the output in an appropriately configured application, but even then
someone has to say that it''s UTF-8 and not some other encoding that''s
being used. As discussed in another recent thread, Python 2.x does
make some reasonable guesses about such matters to the extent that
it''s possible automatically (without magical knowledge).

There is also the problem about use of the "str" built-in function or
any operation where some Unicode object may be converted to a plain
string. It is now recommended that you only convert to plain strings
when you need to produce a sequence of bytes (for output, for
example), and that you indicate how the Unicode values are encoded as
bytes (by specifying an encoding). Python 3.x doesn''t really change
this: it just makes the Unicode/text vs. bytes distinction more
obvious.

Paul

Thanks for the long comment Paul, but it didn''t help massive errors in
Python encoding.

IMHO it''s even better to output wrong encodings rather than halt the
WHOLE damn program by an exception

When debugging encoding problems, the solution is simple. If
characters display wrong, switch to another encoding, one of them must
be right.

But it''s tiring in python to deal with encodings, you have to wrap
EVERY SINGLE character expression with try ... except ... just imagine
what pain it is.

Just like the example I gave in Google Groups, u''\ue863'' can NEVER be
encoded into ''\xfe\x9f''. Not a chance, because python REFUSE to handle
a byte that is greater than range(128).

Strangely the ''mbcs'' encoding system can. Does ''mbcs'' have magic or
something? But it''s Windows-specific

Dealing with character encodings is really simple. AFAIK early
encoding before Unicode, although they have many names, are all based
on hacks. Take Chinese characters as an example. They are called
GB2312 encoding, in fact it is totally compatible with range(256)
ANSI. (There are minor issues like display half of a wide-character in
a question mark ? but at least it''s readable) If you just output
serials of byte array, it IS GB2312. The same is true with BIG5, JIS,
etc.
Like I said, str() should NOT throw an exception BY DESIGN, it''s a
basic language standard. str() is not only a convert to string
function, but also a serialization in most cases.(e.g. socket) My
simple suggestion is: If it''s a unicode character, output as UTF-8;
other wise just ouput byte array, please do not encode it with really
stupid range(128) ASCII. It''s not guessing, it''s totally wrong.


这篇关于关于aPython程序中的汉字的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆