在PHP中将字符串拆分为Unicode字符数组的最佳方法是什么? [英] What is the best way to split a string into an array of Unicode characters in PHP?
问题描述
在PHP中,将字符串拆分为Unicode字符数组的最佳方法是什么?如果输入的内容不一定是UTF-8?
In PHP, what is the best way to split a string into an array of Unicode characters? If the input is not necessarily UTF-8?
我想知道输入字符串中的Unicode字符集是否是另一组Unicode字符的子集.
I want to know whether the set of Unicode characters in an input string is a subset of another set of Unicode characters.
为什么不直接使用mb_
系列功能,因为前几个答案没有?
Why not run straight for the mb_
family of functions, as the first couple of answers didn't?
推荐答案
您可以在PCRE regex中使用'u'修饰符;参见模式修改器(引用):
You could use the 'u' modifier with PCRE regex ; see Pattern Modifiers (quoting) :
u(PCRE8)
u (PCRE8)
此修饰符可启用其他功能 PCRE的功能是 与Perl不兼容.图案 字符串被视为UTF-8.这 修饰符可从PHP 4.1.0获得 或更高版本(在Unix上以及从PHP 4.2.3起) 在Win32上. UTF-8的有效性 从PHP 4.3.5开始检查模式.
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
例如,考虑以下代码:
header('Content-type: text/html; charset=UTF-8'); // So the browser doesn't make our lives harder
$str = "abc 文字化け, efg";
$results = array();
preg_match_all('/./', $str, $results);
var_dump($results[0]);
您将获得无法使用的结果:
You'll get an unusable result:
array
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string 'c' (length=1)
3 => string ' ' (length=1)
4 => string '�' (length=1)
5 => string '�' (length=1)
6 => string '�' (length=1)
7 => string '�' (length=1)
8 => string '�' (length=1)
9 => string '�' (length=1)
10 => string '�' (length=1)
11 => string '�' (length=1)
12 => string '�' (length=1)
13 => string '�' (length=1)
14 => string '�' (length=1)
15 => string '�' (length=1)
16 => string ',' (length=1)
17 => string ' ' (length=1)
18 => string 'e' (length=1)
19 => string 'f' (length=1)
20 => string 'g' (length=1)
但是,使用以下代码:
header('Content-type: text/html; charset=UTF-8'); // So the browser doesn't make our lives harder
$str = "abc 文字化け, efg";
$results = array();
preg_match_all('/./u', $str, $results);
var_dump($results[0]);
(请注意正则表达式末尾的'u')
您得到想要的东西:
array
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string 'c' (length=1)
3 => string ' ' (length=1)
4 => string '文' (length=3)
5 => string '字' (length=3)
6 => string '化' (length=3)
7 => string 'け' (length=3)
8 => string ',' (length=1)
9 => string ' ' (length=1)
10 => string 'e' (length=1)
11 => string 'f' (length=1)
12 => string 'g' (length=1)
希望这会有所帮助:-)
Hope this helps :-)
这篇关于在PHP中将字符串拆分为Unicode字符数组的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!