i18n:寻找专业知识 [英] i18n: looking for expertise

查看:72
本文介绍了i18n:寻找专业知识的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我正在尝试使用gettext将我的Tkinter程序国际化并且

遇到各种问题,所以看起来它不是一些微不足道的

任务。

经过一些研究之后。我为一个概念制定了一些规则,我希望

让我避免进一步的编码麻烦,但我会觉得更多

有信心,如果这里的一些专家会有一个看看我到目前为止做出的
想法并告诉我,如果我在某个地方仍然出错

(顺便说一句,该程序应该只在linux上运行)。所以这就是我到目前为止所用的



1.尽可能使用unicode而不是字节字符串。这可能有点棘手,因为在某些情况下我无法事先知道

如果某个字符串是unicode或字符串字符串;我为此编写了一个辅助模块

,它定义了故障安全的便捷方法

解码/编码字符串和一个Tkinter.UnicodeVar类,我是

用于动态地将用户输入转换为unicode(参见下面的代码)。


2.所以我将不得不使用unicode = 1来调用gettext.install() />

3.确保永远不要将unicode和字节串混合在一个

表达式中


4.为了维护代码可读性最好冒险过多

解码/编码循环而不是太少。


5.文件操作似乎很脆弱;至少我得到了一个错误,当我把b $ b传递给包含特殊字符的文件名作为unicode

os.access(),所以我想每当我做文件操作时

(os.remove(),shutil.copy()...)文件名应该被编码回系统编码之前的
; os.path

方法的文件名操作似乎只是字符串操作,所以编码

文件名似乎没有必要。

和我用来调用外部shell命令的字符串相同。


############文件UnicodeHandler.py ############################### ###

# - * - 编码:iso-8859-1 - * -

导入Tkinter

导入系统

导入区域设置

导入编解码器

def _find_codec(编码):

#如果请求的编解码器可用,则返回True ,否则返回

False

试试:

codecs.lookup(编码)

返回1

除了LookupError:

print''警告:未找到编解码器%s''%encoding

返回0


def _sysencoding():

#尝试猜测系统默认编码

试试:

enc = locale.getpreferredencoding()。lower()

if _find_codec(enc):

print''将语言环境设置为%s''%enc
返回enc

除了AttributeError:

#我们的python太旧了,尝试其他的东西

pass

enc = locale.getdefaultlocale()[1] .lower()

if _find_codec(enc):

print''将语言环境设置为%s''% enc

返回enc

#最后一次尝试

enc = sys.stdin.encoding.lower()

if _find_codec(enc):

print''将语言环境设置为%s''%enc

返回enc

#aargh,没有找到好的,回到latin1并希望

最好

打印''警告:找不到可用的语言环境,使用latin-1''

返回''iso-8859-1''


sysencoding = _sysencoding()

def fsdecode(输入,错误=''严格' '):

''''''故障安全将字符串解码为unicode 。''''''

如果不是isinstance(输入,unicode):

返回unicode(输入,sysencoding,错误)

返回输入


def fsencode(输入,错误=''严格''):

''''''故障安全地将unicode字符串编码到系统中默认

编码。'''''''

if isinstance(输入,unicode):

返回input.encode(sysencoding,errors)

返回输入

类UnicodeVar(Tkinter.StringVar):

def __init __(self,master = None,errors =''strict'' ):

Tkinter.StringVar .__ init __(自我,主人)

self.errors =错误

self.trace(''w'' ,self._str2unicode)


def _str2unicode(self,* args):

old = self.get()

if not isinstance(old,unicode):

new = fsdecode(old,self.errors)

self.set(new)

## ################################################## ###################

所以在我开始弄乱我的所有代码之前,也许有人可以给我

a提示如果我仍然忘记我应该记住的东西或者如果我是

在某处完全错了。


提前致谢


Michael

解决方案

Michael:

5.文件操作似乎很微妙;至少当我将包含特殊字符的文件名作为unicode传递给os.access()时,我得到了一个错误,所以我想每当我做文件操作时都会这样(
(os.remove( ),shutil.copy()...)文件名应该编码回系统编码之前;




这可能会导致Windows失败真正的Unicode文件名可以

不能用当前系统编码进行编码。


Neil


Neil Hodgson < NH ****** @ bigpond.net.au>在留言新闻中写道:< 6O ******************** @ news-server.bigpond.net.au> ...

迈克尔:

5.文件操作似乎很微妙;至少当我将包含特殊字符的文件名作为unicode传递给os.access()时,我得到了一个错误,所以我想每当我做文件操作时都会这样(
(os.remove( ),shutil.copy()...)文件名应该编码回系统编码之前;



这可能导致Windows上真正的Unicode文件失败名称可以不用当前系统编码进行编码。

Neil




就像我说的那样,它只是假设在linux上运行;无论如何,是否可能

我必须处理的文件名才会出现问题

基本上有三个来源:


1。已存在的文件


2.自动生成的文件名,这是因为在现有文件名中添加了一个

ascii-only后缀(如xy - > xy_bak2) )


3.用户输入创建的文件名




如果是,如何避免这些?


任何提示都表示赞赏


Michael


Michael:
< blockquote class =post_quotes>就像我说的,它只能在linux上运行;无论如何,当我必须处理的文件名基本上有三个来源时,可能会出现问题:
...
3.用户输入创建的文件名




您是否了解了如何处理编码中无法表示的用户输入?用户可以通过调用输入法或从另一个应用程序或角色选择器小程序复制和粘贴来轻松地将任何字符

输入到启用Unicode的UI中。 br />

Neil


Hello all,

I am trying to internationalize my Tkinter program using gettext and
encountered various problems, so it looks like it''s not a trivial
task.
After some "research" I made up a few rules for a concept that I hope
lets me avoid further encoding trouble, but I would feel more
confident if some of the experts here would have a look at the
thoughts I made so far and told me if I''m still going wrong somewhere
(BTW, the program is supposed to run on linux only). So here is what I
have so far:

1. use unicode instead of byte strings wherever possible. This can be
a little tricky, because in some situations I cannot know in advance
if a certain string is unicode or byte string; I wrote a helper module
for this which defines convenience methods for fail-safe
decoding/encoding of strings and a Tkinter.UnicodeVar class which I
use to convert user input to unicode on the fly (see the code below).

2. so I will have to call gettext.install() with unicode=1

3. make sure to NEVER mix unicode and byte strings within one
expression

4. in order to maintain code readability it''s better to risk excess
decode/encode cycles than having one too few.

5. file operations seem to be delicate; at least I got an error when I
passed a filename that contains special characters as unicode to
os.access(), so I guess that whenever I do file operations
(os.remove(), shutil.copy() ...) the filename should be encoded back
into system encoding before; The filename manipulations by the os.path
methods seem to be simply string manipulations so encoding the
filenames doesn''t seem to be necessary.

6. messages that are printed to stdout should be encoded first, too;
the same with strings I use to call external shell commands.

############ file UnicodeHandler.py ##################################
# -*- coding: iso-8859-1 -*-
import Tkinter
import sys
import locale
import codecs

def _find_codec(encoding):
# return True if the requested codec is available, else return
False
try:
codecs.lookup(encoding)
return 1
except LookupError:
print ''Warning: codec %s not found'' % encoding
return 0

def _sysencoding():
# try to guess the system default encoding
try:
enc = locale.getpreferredencoding().lower()
if _find_codec(enc):
print ''Setting locale to %s'' % enc
return enc
except AttributeError:
# our python is too old, try something else
pass
enc = locale.getdefaultlocale()[1].lower()
if _find_codec(enc):
print ''Setting locale to %s'' % enc
return enc
# the last try
enc = sys.stdin.encoding.lower()
if _find_codec(enc):
print ''Setting locale to %s'' % enc
return enc
# aargh, nothing good found, fall back to latin1 and hope for the
best
print ''Warning: cannot find usable locale, using latin-1''
return ''iso-8859-1''

sysencoding = _sysencoding()

def fsdecode(input, errors=''strict''):
''''''Fail-safe decodes a string into unicode.''''''
if not isinstance(input, unicode):
return unicode(input, sysencoding, errors)
return input

def fsencode(input, errors=''strict''):
''''''Fail-safe encodes a unicode string into system default
encoding.''''''
if isinstance(input, unicode):
return input.encode(sysencoding, errors)
return input
class UnicodeVar(Tkinter.StringVar):
def __init__(self, master=None, errors=''strict''):
Tkinter.StringVar.__init__(self, master)
self.errors = errors
self.trace(''w'', self._str2unicode)

def _str2unicode(self, *args):
old = self.get()
if not isinstance(old, unicode):
new = fsdecode(old, self.errors)
self.set(new)
################################################## #####################

So before I start to mess up all of my code, maybe someone can give me
a hint if I still forgot something I should keep in mind or if I am
completely wrong somewhere.

Thanks in advance

Michael

解决方案

Michael:

5. file operations seem to be delicate; at least I got an error when I
passed a filename that contains special characters as unicode to
os.access(), so I guess that whenever I do file operations
(os.remove(), shutil.copy() ...) the filename should be encoded back
into system encoding before;



This can lead to failure on Windows when the true Unicode file name can
not be encoded in the current system encoding.

Neil


"Neil Hodgson" <nh******@bigpond.net.au> wrote in message news:<6O********************@news-server.bigpond.net.au>...

Michael:

5. file operations seem to be delicate; at least I got an error when I
passed a filename that contains special characters as unicode to
os.access(), so I guess that whenever I do file operations
(os.remove(), shutil.copy() ...) the filename should be encoded back
into system encoding before;



This can lead to failure on Windows when the true Unicode file name can
not be encoded in the current system encoding.

Neil



Like I said, it''s only supposed to run on linux; anyway, is it likely
that problems will arise when filenames I have to handle have
basically three sources:

1. already existing files

2. automatically generated filenames, which result from adding an
ascii-only suffix to an existing filename (like xy --> xy_bak2)

3. filenames created by user input

?
If yes, how to avoid these?

Any hints are appreciated

Michael


Michael:

Like I said, it''s only supposed to run on linux; anyway, is it likely
that problems will arise when filenames I have to handle have
basically three sources:
...
3. filenames created by user input



Have you worked out how you want to handle user input that is not
representable in the encoding? It is easy for users to input any characters
into a Unicode enabled UI either through invoking an input method or by
copying and pasting from another application or character chooser applet.

Neil


这篇关于i18n:寻找专业知识的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆