使用PHP将Unicode转换为JSON字符串 [英] Convert Unicode from JSON string with PHP

查看:136
本文介绍了使用PHP将Unicode转换为JSON字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了几个解决方案,但还没有设法使任何东西正常工作。



我有一个我从中读取的JSON字符串一个API调用,它包含Unicode字符 - \\\Â\\\£ 例如是£符号。



我想使用PHP将它们转换为£& pound;



我正在研究问题,并找到以下代码(使用我的磅符号进行测试),但似乎没有起作用:

  $ title = preg_replace(/ \\\\u([a-f0-9] {4})/ e,iconv -4LE','UTF-8',pack('V',hexdec('U $ 1'))),'\\\Â\\\£'); 

输出是£我认为这是UTF-16编码是正确的吗?如何将这些转换成HTML?



更新



似乎来自API的JSON字符串有2或3个未转义的Unicode字符串,例如:

  That\\\â\\\€\\\™s(右单引号)
\\\Â\u00a(磅符号)


解决方案

p>它是 UTF-16编码。它似乎是伪造的编码,因为\uXXXX编码独立于Unicode的任何UTF或UCS编码。 \\\Â\\\£ 真正映射到£字符串。



你应该是 \\\£ 这是 $ 的unicode代码点。 p>

{0xC2,0xA3}是此代码点的UTF-8编码的2字节字符。



正如我所知,将原始UTF-8字符串编码为JSON的软件不知道UTF-8的事实,并将每个字节盲编码为转义的unicode代码点,那么您需要将每对unicode代码点转换为一个UTF-8编码字符,然后将其解码为本机PHP编码,使其可打印。

  function fixBadUnicode($ str ){
return utf8_decode(preg_replace(/ \\\\([0-9a-f] {2})\\\\u00([0-9a-f ] {2})/ e,'chr(hexdec($ 1))。chr(hexdec($ 2))',$ str));
}

这里的示例: http://phpfiddle.org/main/code/6sq-rkn



编辑:



如果要修复字符串以获取有效的JSON字符串,则需要使用以下函数:

  function fixBadUnicodeForJson($ str){
$ str = preg_replace(/ \\\\u00([0 -9a-F] {2})\\\\u00([0-9A-F] {2})\\\\u00([0-9A-F] {2} )\\\\u00([0-9a-f] {2})/ e,'chr(hexdec($ 1))chr(hexdec($ 2))。chr(hexdec ($ 3))chr(hexdec($ 4))',$ str);
$ str = preg_replace(/ \\\\([0-9a-f] {2})\\\\u00([0-9a-f] { 2})\\\\u00([0-9a-f] {2})/ e,'chr(hexdec($ 1))chr(hexdec($ 2)) (hexdec($ 3))',$ str);
$ str = preg_replace(/ \\\\([0-9a-f] {2})\\\\u00([0-9a-f] { 2})/ e,'chr(hexdec($ 1))。chr(hexdec($ 2))',$ str);
$ str = preg_replace(/ \\\\([0-9a-f] {2})/ e,'chr(hexdec($ 1))',$ str );
return $ str;
}

编辑2:任何错误的unicode转义的utf-8字节序列到等效的utf-8字符。



请注意,这些字符可能来自诸如Word之类的编辑器不能翻译为ISO-8859-1,因此在ut8_decode后会显示为'?'。


I've been reading up on a few solutions but have not managed to get anything to work as yet.

I have a JSON string that I read in from an API call and it contains Unicode characters - \u00c2\u00a3 for example is the £ symbol.

I'd like to use PHP to convert these into either £ or £.

I'm looking into the problem and found the following code (using my pound symbol to test) but it didn't seem to work:

$title = preg_replace("/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", '\u00c2\u00a3');

The output is £.

Am I correct in thinking that this is UTF-16 encoded? How would I convert these to output as HTML?

UPDATE

It seems that the JSON string from the API has 2 or 3 unescaped Unicode strings, e.g.:

That\u00e2\u0080\u0099s (right single quotation)
\u00c2\u00a (pound symbol)

解决方案

It is not UTF-16 encoding. It rather seems like bogus encoding, because the \uXXXX encoding is independant of whatever UTF or UCS encodings for Unicode. \u00c2\u00a3 really maps to the £ string.

What you should have is \u00a3 which is the unicode code point for £.

{0xC2, 0xA3} is the UTF-8 encoded 2-byte character for this code point.

If, as I think, the software that encoded the original UTF-8 string to JSON was oblivious to the fact it was UTF-8 and blindly encoded each byte to an escaped unicode code point, then you need to convert each pair of unicode code points to an UTF-8 encoded character, and then decode it to the native PHP encoding to make it printable.

function fixBadUnicode($str) {
    return utf8_decode(preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str));
}

Example here: http://phpfiddle.org/main/code/6sq-rkn

Edit:

If you want to fix the string in order to obtain a valid JSON string, you need to use the following function:

function fixBadUnicodeForJson($str) {
    $str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3")).chr(hexdec("$4"))', $str);
    $str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2")).chr(hexdec("$3"))', $str);
    $str = preg_replace("/\\\\u00([0-9a-f]{2})\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1")).chr(hexdec("$2"))', $str);
    $str = preg_replace("/\\\\u00([0-9a-f]{2})/e", 'chr(hexdec("$1"))', $str);
    return $str;
}

Edit 2: fixed the previous function to transform any wrongly unicode escaped utf-8 byte sequence into the equivalent utf-8 character.

Be careful that some of these characters, which probably come from an editor such as Word are not translatable to ISO-8859-1, therefore will appear as '?' after ut8_decode.

这篇关于使用PHP将Unicode转换为JSON字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆