您的语言在实践中对unicode的支持程度如何? [英] How well does your language support unicode in practice?

查看:101
本文介绍了您的语言在实践中对unicode的支持程度如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究新的语言,渴望一种不再需要担心字符集问题的语言,而我在使用PHP的其他新项目中却遇到了很多麻烦.

I'm looking into new languages, kind of craving for one where I no longer need to worry about charset problems amongst inordinate amounts of other niggles I have with PHP for a new project.

我倾向于觉得Java太冗长和混乱,而我不想用6英尺高的杆碰Windows,这往往会排除.Net.除了PHP,C和C ++(我知道其中的后两个,不管ICU库如何,Unicode的内容都很混乱),基本上剩下了所有其他内容.

I tend to find Java too verbose and messy, and my not wanting to touch Windows with a 6-foot pole tends to rule out .Net. That leaves essentially everything else -- except PHP, C and C++ (the latter two of which I know get messy with unicode stuff irrespective of the ICU library).

到目前为止,我已经简短列出了几种语言,即Ruby(喜欢使用mixins),Python,Lisp和Javascript(node.js).但是,我的不一致 信息 unicode 支持并且我很害怕(时间有限),要学习它们中的每一个,以便可以安全地将其排除以排除故障.

I've short listed a few languages to date, namely Ruby (loved the mixins), Python, Lisp and Javascript (node.js). However, I'm coming with highly inconsistent information on unicode support and I'm dreading (lack of time...) to learn each and every one of them to the point where I can safely break it to rule it out.

据我所知,Python 3 似乎拥有它. Ruby 1.9也是如此. Lisp 不一定.大概是Javascript.

In so far as I understood, Python 3 seems to have it. As does Ruby 1.9. Lisp not necessarily. Javascript presumably.

可以说,对语言的支持远不止Unicode,但以我的经验,在处理语言环境时,它往往会成为一个主要缺点.

There's arguably more than unicode support to a language, but in my experience it tends to become a major drawback when dealing with locale.

我也意识到这个问题有些主观. (请不要因此而关闭它:实际上是链接到我发现不满意的几个SO线程.)但是...作为这些语言的用户,它们在实践中对Unicode的支持程度如何?

I also realize the question is somewhat subjective. (Please don't close it on that grounds: I'm actually linking to several SO threads which I found unsatisfying.) But... as a user of any of these languages, how well do they support unicode in practice?

推荐答案

Python的unicode支持在3.x中并未真正改变.自Python 2.x以来,Python中的unicode support 几乎相同,后者引入了单独的unicode类型和编码处理. Python 3.x的变化是unicode成为唯一的字符串类型(并重命名为str),而2.x具有字节字符串(str"...")和unicode字符串(unicodeu"...") ),但经常(但并非总是)不太混合. (允许它们混合使用是一种尝试,以简化从字节串到unicode的转换,但是结果却是一个错误.)总而言之,尽管在Python 2.x中存在错误,Python的unicode支持还是相当不错的.有带数字和命名转义符的unicode文字,unicode文字中非ASCII字符的源编码声明,通过codecs模块的自动编码/解码,许多库中的unicode支持(如正则表达式和DB-API模块)以及内置的unicode数据库.

Python's unicode support did not really change in 3.x. The unicode support in Python has been pretty much the same since Python 2.x, which introduced the separate unicode type and the encoding handling. What Python 3.x changes is that unicode becomes the only string type (and is renamed to str), whereas 2.x has bytestrings (str, "...") and unicode strings (unicode, u"...") that often but not always don't quite mix. (Allowing them to mix was an attempt to make transitioning from bytestrings to unicode easier, but it turned out a mistake.) All in all, Python's unicode support is quite good, mistakes in Python 2.x notwithstanding. There's unicode literals with numeric and named escapes, source-encoding declarations for non-ASCII characters in unicode literals, automatic encoding/decoding through the codecs module, unicode support in many libraries (like the regular expression and DB-API modules) and a builtin unicode database.

也就是说,您 still 还需要了解编码,以便正确处理文本.您的程序将以某种编码方式(从文件,环境变量或通过其他输入)接收字节,并且需要以该编码方式对其进行解释.如果您不知道编码(并且无法根据数据确定编码,例如HTML或XML),则实际上只能将数据作为字节进行处理.如果您知道编码,Python确实可以让您透明地对其进行处理.

That said, you still need to know about encodings in order to handle text correctly. Your program will receive bytes in some encoding (be it from files, from environment variables or through other input) and they will need to be interpreted in that encoding. If you don't know the encoding (and can't determine it from the data, like in HTML or XML) you can really only process the data as bytes. If you do know the encoding, Python does allow you to deal with it mostly transparently.

这篇关于您的语言在实践中对unicode的支持程度如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆