哪些因素导致PHP Unicode不兼容? [英] What factors make PHP Unicode-incompatible?

查看:95
本文介绍了哪些因素导致PHP Unicode不兼容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在脚本中使用UTF-8字符.

I am able use UTF-8 characters just fine in my scripts.

事实上,有可能具有变量名称,并且函数包含Unicode字符.

还有 mb_string扩展,它处理多字节字符串,但是在无数文章中,PHP是因缺乏Unicode支持而受到批评.

There is also the mb_string extension which deals with multi-byte strings, yet in countless articles PHP is criticized for its lack of Unicode support.

我不明白.为什么说PHP不支持Unicode?

I don't get it; why is PHP said to not support Unicode?

推荐答案

几年前启动PHP时,并没有真正支持UTF-8.我们谈论的是Windows 98/Me之类的非Unicode操作系统仍是最新的时代,而Delphi之类的其他主要语言也是非Unicode的时代.从第一天开始,并不是所有语言都考虑到Unicode的设计,并且很难将语言完全更改为Unicode而又不会造成很多麻烦.例如,Delphi仅在一两年前才成为Unicode兼容,而其他语言(如Java或C#)从第一天开始就以Unicode设计.

When PHP was started several years ago, UTF-8 was not really supported. We are talking about a time when non-Unicode OS like Windows 98/Me was still current and when other big languages like Delphi were also non-Unicode. Not all languages were designed with Unicode in mind from day 1, and completely changing your language to Unicode without breaking a lot of stuff is hard. Delphi only became Unicode compatible a year or two ago for example, while other languages like Java or C# were designed in Unicode from Day 1.

因此,当PHP成长为PHP 3,PHP 4和现在的PHP 5时,没有人决定添加Unicode.为什么?大概是为了与现有脚本兼容,或者因为utf8_de/encode和mb_string已经存在并且可以工作.我不确定,但我坚信这与有机增长有关.默认情况下,功能并不简单存在,它们必须由某人编写,而PHP尚未实现.

So when PHP grew and became PHP 3, PHP 4 and now PHP 5, simply no one decided to add Unicode. Why? Presumably to keep compatible with existing scripts or because utf8_de/encode and mb_string already existed and work. I do not know for sure, but I strongly believe that it has something to do with organic growth. Features do not simply exist by default, they have to be written by someone, and that simply did not happen for PHP yet.

好的,我看错了这个问题.问题是:字符串如何在内部存储?如果我键入Währung"或Écriture",则使用哪种编码来创建所使用的字节?如果是PHP,则为带有代码页的ASCII.这意味着:如果我使用ISO-8859-15对字符串进行编码,并使用一些中文代码页对其进行解码,则会得到奇怪的结果.另一种选择是在C#或Java之类的语言中,所有内容都以Unicode形式存储,这意味着:不再有代码页,从理论上讲,您不会搞砸.我建议 Joel的文章有关Unicode和字符集的内容,但实际上归结为:如何字符串存储在内部,而PHP的答案是不是Unicode",这意味着在处理字符串时必须非常小心和明确,以确保在输入,存储(数据库)和输出,这很容易出错.

Ok, I read the question wrong. The question is: How are strings stored internally? If I type in "Währung" or "Écriture", which Encoding is used to create the bytes used? In case of PHP, it is ASCII with a Codepage. That means: If I encode the string using ISO-8859-15 and you decode it with some chinese codepage, you will get weird results. The alternative is in languages like C# or Java where everything is stored as Unicode, which means: There is no codepage anymore, and theoretically you cannot mess up. I recommend Joel's article about Unicode and Character Sets, but essentially it boils down to: How are strings stored internally, and the answer with PHP is "Not in Unicode", which means that you have to be very careful and explicit when processing strings to make sure to always keep the string in the proper encoding during input, storage (database) and output, which is very errorprone.

这篇关于哪些因素导致PHP Unicode不兼容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆