您将如何创建所有UTF-8字符的字符串? [英] How would you create a string of all UTF-8 characters?
问题描述
有很多方法可以表示+1百万个 UTF-8字符.取带有大写字母(Ā
)的拉丁大写字母"A".这是Unicode代码点U+0100
,十六进制数0xc4 0x80
,十进制数196 128
和二进制11000100 10000000
.
There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (Ā
). This is unicode code point U+0100
, hex number 0xc4 0x80
, decimal number 196 128
, and binary 11000100 10000000
.
我想创建用于测试应用程序的前65,535个UTF-8字符的集合.这些都是直到代码点U+FFFF
(byte3)的unicode字符.
I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are all unicode characters up to code point U+FFFF
(byte3).
是否可以执行类似for($x=0)
循环的操作,然后将结果十进制转换为另一个基数(例如十六进制),从而允许创建匹配的unicode字符?
Is it possible to do something like a for($x=0)
loop and then convert the resulting decimal to another base (like hex) which would allow the creation of the matching unicode character?
我可以使用以下方式创建值Ā
:
I can create the value Ā
using something like this:
$char = "\xc4\x80";
// or
$char = chr(196).chr(128);
但是,我不确定如何将其转变为自动化过程.
However, I am not sure how to turn this into an automated process.
// fail!
$char = "\x". dechex($a). "\x". dexhex($b);
推荐答案
您可以利用iconv
(或其他一些函数)将代码点号转换为UTF-8字符串:
You can leverage iconv
(or a few other functions) to convert a code point number to a UTF-8 string:
function unichr($i)
{
return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}
$codeunits = array();
for ($i = 0; $i<0xD800; $i++)
$codeunits[] = unichr($i);
for ($i = 0xE000; $i<0xFFFF; $i++)
$codeunits[] = unichr($i);
$all = implode($codeunits);
(我避免使用替代范围0xD800–0xDFFF,因为它们本身不能有效地放入UTF-8;那应该是"CESU-8".)
(I avoided the surrogate range 0xD800–0xDFFF as they aren't valid to put in UTF-8 themselves; that would be "CESU-8".)
这篇关于您将如何创建所有UTF-8字符的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!