成为Unicode意识 [英] Becoming Unicode Aware

查看:72
本文介绍了成为Unicode意识的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图成为''unicode-aware''... *叹气*。什么是引用 - '一个

ascii的母语人士永远不会学会说像一个

native''的unicode。麻烦的是我认为我是拉丁语1的原住民,而没有意识到它。


我对udnerstanding unicode的主要问题是什么使用

任意文本而不指定编码。据我所知,这种情况的技术术语是错误的。例如。我

有一个CGI留言板脚本。是唯一能知道用户输入什么内容的方法,问他们吗?


无论如何 - ConfigObj从纯文本文件中读取配置文件。

是否是在文本文件中指定编码的标准?我知道

python脚本有一个方法 - 我应该使用它吗?


另外 - 假设我知道编码,或者让程序员指定,是

以下内容足以读取以下文件:


def function(setoflines,encoding =''ascii''):

对于setoflines中的行:

如果编码:

line = line.decode(encoding)


问候,

模糊
http://www.voidspace .org.uk / atlantib ... thonutils.html

I''m trying to become ''unicode-aware''... *sigh*. What''s that quote - ''a
native speaker of ascii will never learn to speak unicode like a
native''. The trouble is I think I''ve been a native speaker of latin-1
without realising it.

My main problem with udnerstanding unicode is what to do with
arbitrary text without an encoding specified. To the best of my
knowledge the technical term for this situation is ''buggered''. E.g. I
have a CGI guestbook script. Is the only way of knowing what encodign
the user is typing in, to ask them ?

Anyway - ConfigObj reads config files from plain text files. Is there
a standard for specifying the encoding within the text file ? I know
python scripts have a method - should I just use that ?

Also - suppose I know the encoding, or let the programmer specify, is
the following sufficient for reading the files in :

def afunction(setoflines, encoding=''ascii''):
for line in setoflines:
if encoding:
line = line.decode(encoding)

Regards,
Fuzzy
http://www.voidspace.org.uk/atlantib...thonutils.html

推荐答案

> udnerstanding unicode的主要问题是如何处理
> My main problem with udnerstanding unicode is what to do with
没有指定编码的任意文本。据我所知,这种情况的技术术语是错误的。例如。我有一个CGI留言板脚本。是知道用户输入什么编码的唯一方法,问问他们?


不幸的是,http标准似乎缺乏规范如何传输表格数据

编码。但似乎大多数浏览器

了解您的页面交付的特定编码将用于

回复。


无论如何 - ConfigObj从纯文本文件中读取配置文件。是否有用于在文本文件中指定编码的标准?我知道py python脚本有一个方法 - 我应该使用它吗?


不知道configobj是什么 - 它是你自己的配置解析器吗?

另外 - 假设我知道编码,或者让程序员指定,是
def fiunction(setoflines,encoding =''ascii''):
用于setoflines中的行:
如果编码:
line = line.decode(encoding)
arbitrary text without an encoding specified. To the best of my
knowledge the technical term for this situation is ''buggered''. E.g. I
have a CGI guestbook script. Is the only way of knowing what encodign
the user is typing in, to ask them ?
Unfortunately the http standard seems to lack a specification how form data
encoding is to be transferred. But it seems that most browser which
understand a certain encoding your page is delivered in will use that for
replying.

Anyway - ConfigObj reads config files from plain text files. Is there
a standard for specifying the encoding within the text file ? I know
python scripts have a method - should I just use that ?
No idea what configobj is - is it you own config parser?
Also - suppose I know the encoding, or let the programmer specify, is
the following sufficient for reading the files in :

def afunction(setoflines, encoding=''ascii''):
for line in setoflines:
if encoding:
line = line.decode(encoding)




是的,它应该是 - 但为什么if?这是不必要的,因为它的条件将始终是真实的 - 并且你_want_就这样,因为功能的结果

应该始终是unicode对象,无论编码是什么用了。

-

问候,


Diez B. Roggisch



Yes, it should be - but why the if? It is unnecessary, as its condition will
always be true - and you _want_ it that way, as the result of afunction
should always be unicode objects, no matter what encoding was used.
--
Regards,

Diez B. Roggisch

fu******@gmail.com (Michael Foord)写道......
fu******@gmail.com (Michael Foord) wrote ...
我正试图成为''unicode-aware''... *叹气*。什么是引用 - ''ascii的母语人士永远不会学会说像本地人那样的unicode。麻烦的是我认为我是拉丁语1的母语人士而没有意识到这一点。
它*是*奇怪的,恕我直言,我的数据库连接器吐出类似字符串

有8位数据的东西,这样当我/ b
" " .join(array_of_database_strings)他们,我失败了。我已经学会了将它们转换成unicode字符串,但它很奇怪。

类似于一对(编码,字符串)的东西对我来说似乎更自然,但是

可能我只是没有得到问题。
我对udnerstanding unicode的主要问题是如何处理没有指定编码的任意文本。据我所知,这种情况的技术术语是错误的。例如。我有一个CGI留言板脚本。是知道用户输入什么编码的唯一方法,问他们吗?
I''m trying to become ''unicode-aware''... *sigh*. What''s that quote - ''a
native speaker of ascii will never learn to speak unicode like a
native''. The trouble is I think I''ve been a native speaker of latin-1
without realising it. It *is* odd, IMHO, that my database connector spits out strings-like
things that have 8-bit data so that when I
"".join(array_of_database_strings) them, I get a failure. I''ve
learned to by-hand them into unicode strings, but it is odd.
Something like a pair (encoding,string) seems more natural to me, but
probably I just don''t get the issues.
My main problem with udnerstanding unicode is what to do with
arbitrary text without an encoding specified. To the best of my
knowledge the technical term for this situation is ''buggered''. E.g. I
have a CGI guestbook script. Is the only way of knowing what encodign
the user is typing in, to ask them ?



我发现这个链接
https://bugzilla.mozilla.org/show_bug.cgi?id=18643#c12

有用。


Jim


I found this link
https://bugzilla.mozilla.org/show_bug.cgi?id=18643#c12
useful.

Jim


文章< 54 ****** ********************@posting.google.com>,
jh ******* @ smcvt.edu (Jim Hefferon)写道:
In article <54**************************@posting.google.com >,
jh*******@smcvt.edu (Jim Hefferon) wrote:
我的主要问题udnerstanding unicode是指如何在没有指定编码的情况下使用任意文本。据我所知,这种情况的技术术语是错误的。例如。我有一个CGI留言板脚本。是知道用户输入什么编码的唯一方法,问他们吗?
My main problem with udnerstanding unicode is what to do with
arbitrary text without an encoding specified. To the best of my
knowledge the technical term for this situation is ''buggered''. E.g. I
have a CGI guestbook script. Is the only way of knowing what encodign
the user is typing in, to ask them ?


我发现这个链接
https://bugzilla.mozilla.org/show_bug.cgi?id=18643#c12
有用。


I found this link
https://bugzilla.mozilla.org/show_bug.cgi?id=18643#c12
useful.




同样,我发现这个链接
http://www.w3schools.com/tags/tag_form.asp

有用。请参阅accept-charset属性。


只需



Likewise, I found this link
http://www.w3schools.com/tags/tag_form.asp
useful. See the accept-charset atribute.

Just


这篇关于成为Unicode意识的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆