声明使PHP脚本完全兼容Unicode [英] Declaration to make PHP script completely Unicode-friendly

查看:63
本文介绍了声明使PHP脚本完全兼容Unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

记住要在PHP中完成所有必要的工作以使其与Unicode一起正常工作是非常棘手,乏味且容易出错的,因此我正在寻找使PHP绝对神奇升级的技巧.从老旧的ASCII字节模式现代Unicode字符模式,所有可能的一切,而且只需使用一个简单的声明即可.

Remembering to do all the stuff you need to do in PHP to get it to work properly with Unicode is far too tricky, tedious, and error-prone, so I'm looking for the trick to get PHP to magically upgrade absolutely everything it possibly can from musty old ASCII byte mode into modern Unicode character mode, all at once and by using just one simple declaration.

这个想法是使PHP脚本现代化,使其可以与Unicode一起使用,而不必使源代码陷入一堆混乱的备用函数调用和特殊正则表达式的混乱之中.一切都应该使用Unicode 做正确的事" ,没有任何问题.

The idea is to modernize PHP scripts to work with Unicode without having to clutter up the source code with a bunch of confusing alternate function calls and special regexes. Everything should just "Do The Right Thing" with Unicode, no questions asked.

鉴于目标是具有最小的麻烦的最大的Unicodeness ,此声明必须至少做这些事情(加上我忘记的其他事情,可以促进总体目标) :

Given that the goal is maximum Unicodeness with minimal fuss, this declaration must at least do these things (plus anything else I’ve forgotten that furthers the overall goal):

  • PHP脚本源本身被视为使用UTF-8(例如,字符串和正则表达式).

  • The PHP script source is itself in considered to be in UTF‑8 (eg, strings and regexes).

所有输入和输出会根据需要自动转换为UTF-8,或从中转换为标准化选项(例如,将所有输入标准化为NFD,将所有输出标准化为NFC).

All input and output is automatically converted to/from UTF‑8 as needed, and with a normalization option (eg, all input normalized to NFD and all output normalized to NFC).

所有Unicode版本的函数都改用这些函数(例如,Collator::sort表示sort).

All functions with Unicode versions use those instead (eg, Collator::sort for sort).

所有字节函数(例如strlenstrstrstrpossubstr)的工作方式与相应的字符函数(例如mb_strlenmb_strstrmb_strpos,和mb_substr).

All byte functions (eg, strlen, strstr, strpos, and substr) work like the corresponding character functions (eg, mb_strlen, mb_strstr, mb_strpos, and mb_substr).

所有正则表达式和正则表达式函数都可以在Unicode上透明地工作(即,像所有预习者都隐式地添加了/u一样,而\w\b\s之类的东西都可以在Unicode Unicode标准要求工作的方式, 等).

All regexes and regexy functions transparently work on Unicode (ie, like all the preggers have /u tacked on implicitly, and things like \w and \b and \s all work on Unicode the way The Unicode Standard requires them to work, etc).

为了获得更多的荣誉,我想有一种方法可以将该声明升级"到全字素模式.这样,字节或字符函数就变成了字素函数(例如,grapheme_strlengrapheme_strstrgrapheme_strposgrapheme_substr),并且正则表达式在正确的字素上起作用(即.-甚至[^abc]-匹配Unicode字素簇,无论有多少代码点)包含等).

For extra credit :), I'd like there to be a way to "upgrade" this declaration to full grapheme mode. That way the byte or character functions become grapheme functions (eg, grapheme_strlen, grapheme_strstr, grapheme_strpos, and grapheme_substr), and the regex stuff works on proper graphemes (ie, . — or even [^abc] — matches a Unicode grapheme cluster no matter how many code points it contains, etc).

推荐答案

完全unicode 正是 PHP 6 的概念-一直是已取消一年多了.

That full-unicode thing was precisely the idea of PHP 6 -- which has been canceled more than one year ago.

因此,没有,除了使用正确的功能,而且记住字符与字节不同之外,没有其他方法可以得到所有这些信息.

So, no, there is no way of getting all that -- except by using the right functions, and remembering that characters are not the same as bytes.


不过,可能对您有帮助的第四点是 函数重载mbstring扩展名(引用)的功能 :


One thing that might help with you fourth point, though, is the Function Overloading Feature of the mbstring extension (quoting) :

mbstring支持'function 超载"功能,可让您 为这样的文件增加多字节意识 无需修改代码的应用程序 通过重载多字节副本 在标准的字符串函数上.
例如,mb_substr()是 如果不是,则调用而不是substr() 功能重载已启用.

mbstring supports a 'function overloading' feature which enables you to add multibyte awareness to such an application without code modification by overloading multibyte counterparts on the standard string functions.
For example, mb_substr() is called instead of substr() if function overloading is enabled.

这篇关于声明使PHP脚本完全兼容Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆