在 php 中将 UCS2/HexEncoded 字符转换为 UTF8 [英] UCS2/HexEncoded characters to UTF8 in php

查看:33
本文介绍了在 php 中将 UCS2/HexEncoded 字符转换为 UTF8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前问过一个问题,从 UTF-8 获取 UCS-2/HexEncoded 字符串,我从以下链接中的一些人那里得到了一些帮助.

I asked a question previously to get a UCS-2/HexEncoded string from UTF-8, and I got some help from some guys at the following link.

UCS2/HexEncoded 字符

但现在我需要从 PHP 中的 UCS-2/HexEncoded 字符串中获取正确的 UTF-8.

But now I need to get the correct UTF-8 from a UCS-2/HexEncoded string in PHP.

对于以下字符串:

00480065006C006C006F 将返回你好"

00480065006C006C006F will return 'Hello'

06450631062d0628064b06270020063906270644064500200021 将以阿拉伯语返回 (!مرحبا عالم)

06450631062d0628064b06270020063906270644064500200021 will return (!مرحبا عالم) in arabic

推荐答案

您可以通过使用 hexdec() 转换十六进制字符,重新打包组件字符,然后使用 mb_convert_encoding() 将 UCS-2 转换为 UTF-8.正如我在回答您的另一个问题时提到的,您仍然需要小心输出编码,尽管在这里您特别要求使用 UTF-8,因此我们将在接下来的示例中使用它.

You can recompose a Hex-representation by converting the hexadecimal chars with hexdec(), repacking the component chars, and then using mb_convert_encoding() to convert from UCS-2 into UTF-8. As I mentioned in my answer to your other question, you'll still need to be careful with the output encoding, although here you've specifically requested UTF-8, so we'll use that for the upcoming sample.

这是一个将十六进制的 UCS-2 转换为原生字符串形式的 UTF-8 的示例.由于 PHP 当前不附带 hex2bin() 函数,这会使事情变得非常简单,我们将使用最后发布在参考链接中的函数.我已将其重命名为 local_hex2bin(),以防它与 PHP 的未来版本或您项目中包含的其他一些第三方代码中的定义发生冲突.

Here's a sample that does the work of converting UCS-2 in Hex to UTF-8 in native string form. As PHP currently doesn't ship with a hex2bin() function, which would make things very easy, we'll use the one posted at the reference link at the end. I've renamed it to local_hex2bin() just in case it conflicts with a future version of PHP or with a definition in some other 3rd party code that you include in your project.

<?php
function local_hex2bin($h)
{
if (!is_string($h)) return null;
$r='';
for ($a=0; $a<strlen($h); $a+=2) { $r.=chr(hexdec($h{$a}.$h{($a+1)})); }
return $r;
};

header('Content-Type: text/html; charset=UTF-8');
mb_http_output('UTF-8');
echo '<html><head>';
echo '<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />';
echo '</head><body>';
echo 'output encoding: '.mb_http_output().'<br />';
$querystring = $_SERVER['QUERY_STRING'];
// NOTE: we could substitute one of the following:
// $querystring = '06450631062d0628064b06270020063906270644064500200021';
// $querystring = '00480065006C006C006F';
$ucs2string = local_hex2bin($querystring);
// NOTE: The source encoding could also be UTF-16 here.
// TODO: Should check byte-order-mark, if available, in case
//       16-bit-aligned bytes are reversed.
$utf8string = mb_convert_encoding($ucs2string, 'UTF-8', 'UCS-2');
echo 'query string: '.$querystring.'<br />';
echo 'converted string: '.$utf8string.'<br />';
echo '</body>';
?>

在本地,我将此示例页面称为 UCS2HexToUTF8.php,然后使用查询字符串设置输出.

Locally, I called this sample page UCS2HexToUTF8.php, and then used a querystring to set the output.

UCS2HexToUTF8.php?06450631062d0628064b06270020063906270644064500200021
--
encoding: UTF-8
query string: 06450631062d0628064b06270020063906270644064500200021
converted string: مرحبًا عالم !

UCS2HexToUTF8.php?00480065006C006C006F
--
output encoding: UTF-8
query string: 00480065006C006C006F
converted string: Hello

这是 hex2bin() 函数原始源的链接.
PHP:bin2hex(),评论 #86123 @ php.net

Here's the link to the original source of the hex2bin() function.
PHP: bin2hex(), comment #86123 @ php.net

此外,正如我在调用 mb_convert_encoding() 之前的评论中所指出的,您可能想要尝试检测源正在使用哪种字节序,特别是如果您的应用程序有部分其中一台服务器上的一个或多个 CPU 在方向上与其他 CPU 不同.

Also, as noted in my comments before the call to mb_convert_encoding(), you'll probably want to try and detect which endian ordering is in use by the source, especially if your application has parts where one or more CPUs on one server differ from the rest by orientation.

这是一个可以帮助您识别字节顺序标记 (BOM) 的链接.
字节序标记@维基百科

Here's a link that can help you identify the byte-order marks (BOM).
Byte order mark @ Wikipedia

这篇关于在 php 中将 UCS2/HexEncoded 字符转换为 UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆