如何反转Unicode字符串 [英] How to reverse a Unicode string

查看:137
本文介绍了如何反转Unicode字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

评论此问题的答案中得到了提示表示PHP无法反转Unicode字符串.

It was hinted in a comment to an answer to this question that PHP can not reverse Unicode strings.

对于Unicode,它可以在PHP中使用 因为大多数应用将其处理为 二进制的.是的,PHP是8位干净的.尝试 等价于PHP:perl -Mutf8 -e'打印标量反向(ほげほげ")'您将得到垃圾, 不是げほげほ". – jrockway

As for Unicode, it works in PHP because most apps process it as binary. Yes, PHP is 8-bit clean. Try the equivalent of this in PHP: perl -Mutf8 -e 'print scalar reverse("ほげほげ")' You will get garbage, not "げほげほ". – jrockway

不幸的是,PHP Unicode支持atm充其量是正确的",这是正确的.这将希望与PHP6 .

And unfortunately it is correct that PHPs unicode support atm is at best "lacking". This will hopefully change drastically with PHP6.

PHPs MultiByte函数确实提供了您需要的基本功能处理unicode,但是它前后不一致并且确实缺少很多功能.其中之一是反转字符串的功能.

PHPs MultiByte functions does provide the basic functionality you need to deal with unicode, but it is inconsistent and does lack a lot of functions. One of these is a function to reverse a string.

我当然不希望出于其他原因撤消此案,然后再查明是否有可能.而且,我编写了一个函数来完成逆转此Unicode文本这一巨大的复杂任务,因此您可以放宽一点,直到PHP6.

I of course wanted to reverse this text for no other reason then to figure out if it was possible. And I made a function to accomplish this enormous complex task of reversing this Unicode text, so you can relax a bit longer until PHP6.

测试代码:

$enc = 'UTF-8';
$text = "ほげほげ";
$defaultEnc = mb_internal_encoding();

echo "Showing results with encoding $defaultEnc.\n\n";

$revNormal = strrev($text);
$revInt = mb_strrev($text);
$revEnc = mb_strrev($text, $enc);

echo "Original text is: $text .\n";
echo "Normal strrev output: " . $revNormal . ".\n";
echo "mb_strrev without encoding output: $revInt.\n";
echo "mb_strrev with encoding $enc output: $revEnc.\n";

if (mb_internal_encoding($enc)) {
    echo "\nSetting internal encoding to $enc from $defaultEnc.\n\n";

    $revNormal = strrev($text);
    $revInt = mb_strrev($text);
    $revEnc = mb_strrev($text, $enc);

    echo "Original text is: $text .\n";
    echo "Normal strrev output: " . $revNormal . ".\n";
    echo "mb_strrev without encoding output: $revInt.\n";
    echo "mb_strrev with encoding $enc output: $revEnc.\n";

} else {
    echo "\nCould not set internal encoding to $enc!\n";
}

推荐答案

Grapheme函数比mbstring和PCRE函数更正确地处理UTF-8字符串/Mbstring和PCRE可能会破坏字符.通过执行以下代码,您可以看到它们之间的差异.

Grapheme functions handle UTF-8 string more correctly than mbstring and PCRE functions/ Mbstring and PCRE may break characters. You can see the defference between them by executing the following code.

function str_to_array($string)
{
    $length = grapheme_strlen($string);
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

        $ret[] = grapheme_substr($string, $i, 1);
    }

    return $ret;
}

function str_to_array2($string)
{
    $length = mb_strlen($string, "UTF-8");
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

    $ret[] = mb_substr($string, $i, 1, "UTF-8");
}

    return $ret;
}

function str_to_array3($string)
{
    return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
}

function utf8_strrev($string)
{
    return implode(array_reverse(str_to_array($string)));
}

function utf8_strrev2($string)
{
    return implode(array_reverse(str_to_array2($string)));
}

function utf8_strrev3($string)
{
    return implode(array_reverse(str_to_array3($string)));
}

// http://www.php.net/manual/en/function.grapheme-strlen.php
$string = "a\xCC\x8A"  // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
         ."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS'  (U+00F6)

var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },
[
  'should be' => "o\xCC\x88"."a\xCC\x8A",
  'grapheme' => utf8_strrev($string),
  'mbstring' => utf8_strrev2($string),
  'pcre' => utf8_strrev3($string)
]));

结果在这里.

array(4) {
  ["should be"]=>
  string(12) "6FCC8861CC8A"
  ["grapheme"]=>
  string(12) "6FCC8861CC8A"
  ["mbstring"]=>
  string(12) "CC886FCC8A61"
  ["pcre"]=>
  string(12) "CC886FCC8A61"
}

从PHP 5.5(intl 3.0)开始可以使用IntlBreakIterator;

IntlBreakIterator can be used since PHP 5.5 (intl 3.0);

function utf8_strrev($str)
{
    $it = IntlBreakIterator::createCodePointInstance();
    $it->setText($str);

    $ret = '';
    $pos = 0;
    $prev = 0;

    foreach ($it as $pos) {
        $ret = substr($str, $prev, $pos - $prev) . $ret;
        $prev = $pos;
    }

    return $ret;  
}

这篇关于如何反转Unicode字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆