是否有一个完整和正确的Unicode支持的编程语言? [英] Is there a programming language with full and correct Unicode support?
问题描述
大多数编程语言都支持Unicode,但都有一些或多或少的文件记录,其中的事情无法正常工作。
示例
Java: / strong>
在StringBuilder / StringBuffer中的reverse()正常工作。但是,如果一个字符需要超过16位编码,String中的length(),charAt()等不会。
C#:
没有找到正确的反向方法,长度和索引访问返回错误的结果。
Perl:同样的问题。 >
PHP:
根本不了解Unicode,mbstring有更好的工作替代品。
我想知道是否有编程语言,它具有完整和正确的Unicode支持?
为了达到这样的目的,必须做出什么妥协?
- 更复杂的算法?
- 更高的内存消耗?
- 性能下降?
- Ints,Linked Lists等数组
- 额外缓存
我发现Python 3在这方面有了很大的变化。
看起来Perl 6得到了很好的Unicode支持:
$ icle icle icle / 5-to-6#post_17例如,它为您提供了三种不同长度的方法:
- 字节(字节数)
- 代码(代码点数量)
- 图(图形量)
这也被整合到Perl的正则表达式中。
>看起来像向我走向正确的方向。
Most programming languages have some support for Unicode, but all have some more or less documented corner cases, where things won't work correctly.
Examples
Java: reverse() in StringBuilder/StringBuffer work correctly. But length(), charAt(), etc. in String do not if a character needs more than 16bit to encode.
C#: Didn't find a correct reverse method, Length and indexed access return wrong results.
Perl: Same problem.
PHP: Does not have an idea of Unicode at all, mbstring has some better working replacements.
I wonder if there is a programming language, which has full and correct Unicode support? What compromises had to be made there to achieve such a thing?
- More complex algorithms?
- Higher memory consumption?
- Slower performance?
How was it implemented internally?
- Array of Ints, Linked Lists, etc.
- Additional buffering
I saw that Python 3 had some pretty big changes in this area. How close is Python 3 now to a correct implementation?
It looks like Perl 6 gets good Unicode support:
perlgeek.de/en/article/5-to-6#post_17
For instance it provides you with three different length methods:
- bytes (amount of bytes)
- codes (amount of codepoints)
- graphs (amount of graphemes)
This gets integrated into Perl's regular expressions as well.
Looks like a step into the right direction to me.
这篇关于是否有一个完整和正确的Unicode支持的编程语言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!