如何将unicode代码点转换为十六进制HTML实体? [英] How do I convert unicode codepoints to hexadecimal HTML entities?
问题描述
我有一个数据文件(准确地说是Apple plist),它有 Unicode codepoints 像 \U00e8
和 \U2019
。我需要使用PHP将它们转换为有效的十六进制 HTML实体。
I have a data file (an Apple plist, to be exact), that has Unicode codepoints like \U00e8
and \U2019
. I need to turn these into valid hexadecimal HTML entities using PHP.
我现在正在做的是一长串:
What I'm doing right now is a long string of:
$fileContents = str_replace("\U00e8", "è", $fileContents);
$fileContents = str_replace("\U2019", "’", $fileContents);
这显然是可怕的。我可以使用正则表达式将 \U
和所有尾随的 0s
转换为& amp ; #x
,然后贴在尾部的;
上,但是这看起来也很笨拙。
Which is clearly dreadful. I could use a regular expression to convert the \U
and all trailing 0s
to &#x
, then stick on the trailing ;
, but that also seems heavy-handed.
是否有一种干净简单的方式来取得一个字符串,并将所有unicode代码点替换为HTML实体?
Is there a clean, simple way to take a string, and replace all the unicode codepoints to HTML entities?
推荐答案
您可以使用 preg_replace
:
You can use preg_replace
:
preg_replace('/\\\\U0*([0-9a-fA-F]{1,5})/', '&#x\1;', $fileContents);
测试RE:
PS> 'some \U00e8 string with \U2019 embedded Unicode' -replace '\\U0*([0-9a-f]{1,5})','&#x$1;'
some è string with ’ embedded Unicode
这篇关于如何将unicode代码点转换为十六进制HTML实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!