声明使PHP脚本完全兼容Unicode [英] Declaration to make PHP script completely Unicode-friendly

查看：63 发布时间：2020/5/27 2:52:39 php regex unicode utf-8

本文介绍了声明使PHP脚本完全兼容Unicode的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

记住要在PHP中完成所有必要的工作以使其与Unicode一起正常工作是非常棘手，乏味且容易出错的，因此我正在寻找使PHP绝对神奇升级的技巧.从老旧的ASCII字节模式到现代Unicode字符模式，所有可能的一切，而且只需使用一个简单的声明即可.

Remembering to do all the stuff you need to do in PHP to get it to work properly with Unicode is far too tricky, tedious, and error-prone, so I'm looking for the trick to get PHP to magically upgrade absolutely everything it possibly can from musty old ASCII byte mode into modern Unicode character mode, all at once and by using just one simple declaration.

这个想法是使PHP脚本现代化，使其可以与Unicode一起使用，而不必使源代码陷入一堆混乱的备用函数调用和特殊正则表达式的混乱之中.一切都应该使用Unicode 做正确的事" ，没有任何问题.

The idea is to modernize PHP scripts to work with Unicode without having to clutter up the source code with a bunch of confusing alternate function calls and special regexes. Everything should just "Do The Right Thing" with Unicode, no questions asked.

鉴于目标是具有最小的麻烦的最大的Unicodeness ，此声明必须至少做这些事情(加上我忘记的其他事情，可以促进总体目标) :

Given that the goal is maximum Unicodeness with minimal fuss, this declaration must at least do these things (plus anything else I’ve forgotten that furthers the overall goal):

PHP脚本源本身被视为使用UTF-8(例如，字符串和正则表达式).

The PHP script source is itself in considered to be in UTF‑8 (eg, strings and regexes).

所有输入和输出会根据需要自动转换为UTF-8，或从中转换为标准化选项(例如，将所有输入标准化为NFD，将所有输出标准化为NFC).

All input and output is automatically converted to/from UTF‑8 as needed, and with a normalization option (eg, all input normalized to NFD and all output normalized to NFC).

所有Unicode版本的函数都改用这些函数(例如，Collator::sort表示sort).

All functions with Unicode versions use those instead (eg, Collator::sort for sort).

所有字节函数(例如strlen，strstr，strpos和substr)的工作方式与相应的字符函数(例如mb_strlen，mb_strstr，mb_strpos，和mb_substr).

All byte functions (eg, strlen, strstr, strpos, and substr) work like the corresponding character functions (eg, mb_strlen, mb_strstr, mb_strpos, and mb_substr).

所有正则表达式和正则表达式函数都可以在Unicode上透明地工作(即，像所有预习者都隐式地添加了/u一样，而\w和\b和\s之类的东西都可以在Unicode Unicode标准要求工作的方式，等).

All regexes and regexy functions transparently work on Unicode (ie, like all the preggers have /u tacked on implicitly, and things like \w and \b and \s all work on Unicode the way The Unicode Standard requires them to work, etc).

为了获得更多的荣誉，我想有一种方法可以将该声明升级"到全字素模式.这样，字节或字符函数就变成了字素函数(例如，grapheme_strlen，grapheme_strstr，grapheme_strpos和grapheme_substr)，并且正则表达式在正确的字素上起作用(即.-甚至[^abc]-匹配Unicode字素簇，无论有多少代码点)包含等).

For extra credit :), I'd like there to be a way to "upgrade" this declaration to full grapheme mode. That way the byte or character functions become grapheme functions (eg, grapheme_strlen, grapheme_strstr, grapheme_strpos, and grapheme_substr), and the regex stuff works on proper graphemes (ie, . — or even [^abc] — matches a Unicode grapheme cluster no matter how many code points it contains, etc).

声明使PHP脚本完全兼容Unicode [英] Declaration to make PHP script completely Unicode-friendly

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

声明使PHP脚本完全兼容Unicode [英] Declaration to make PHP script completely Unicode-friendly

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭