如何获取utf-8字符串中给定字符的代码点编号? [英] How to get code point number for a given character in a utf-8 string?

查看:66
本文介绍了如何获取utf-8字符串中给定字符的代码点编号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取给定UTF-8字符串的UCS-2代码点.例如,单词"hello"应变为类似"0068 0065 006C 006C 006F"的名称.请注意,这些字符可以来自任何语言,包括诸如东亚语言之类的复杂文字.

I want to get the UCS-2 code points for a given UTF-8 string. For example the word "hello" should become something like "0068 0065 006C 006C 006F". Please note that the characters could be from any language including complex scripts like the east asian languages.

因此,问题归结为将给定字符转换为其UCS-2代码点"

So, the problem comes down to "convert a given character to its UCS-2 code point"

但是如何?拜托,由于我非常着急,任何帮助都将非常感激.

But how? Please, any kind of help will be very very much appreciated since I am in a great hurry.

发问者的回答转录为答案

感谢您的答复,但这需要在PHP v 4或5(而不是6)中完成.

Thanks for your reply, but it needs to be done in PHP v 4 or 5 but not 6.

该字符串将是来自表单字段的用户输入.

The string will be a user input, from a form field.

我想实现utf8to16或utf8decode之类的PHP版本

I want to implement a PHP version of utf8to16 or utf8decode like

function get_ucs2_codepoint($char)
{
    // calculation of ucs2 codepoint value and assign it to $hex_codepoint
    return $hex_codepoint;
}

您可以为我提供PHP的帮助吗,或者可以通过上述版本的PHP来帮助我?

Can you help me with PHP or can it be done with PHP with version mentioned above?

推荐答案

Scott Reynen 将UTF-8转换为Unicode .我在 PHP文档中发现了它. /p>

Scott Reynen wrote a function to convert UTF-8 into Unicode. I found it looking at the PHP documentation.

function utf8_to_unicode( $str ) {

    $unicode = array();        
    $values = array();
    $lookingFor = 1;

    for ($i = 0; $i < strlen( $str ); $i++ ) {
        $thisValue = ord( $str[ $i ] );
    if ( $thisValue < ord('A') ) {
        // exclude 0-9
        if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
             // number
             $unicode[] = chr($thisValue);
        }
        else {
             $unicode[] = '%'.dechex($thisValue);
        }
    } else {
          if ( $thisValue < 128) 
        $unicode[] = $str[ $i ];
          else {
                if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;                
                $values[] = $thisValue;                
                if ( count( $values ) == $lookingFor ) {
                    $number = ( $lookingFor == 3 ) ?
                        ( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
                        ( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
            $number = dechex($number);
            $unicode[] = (strlen($number)==3)?"%u0".$number:"%u".$number;
                    $values = array();
                    $lookingFor = 1;
          } // if
        } // if
    }
    } // for
    return implode("",$unicode);

} // utf8_to_unicode

这篇关于如何获取utf-8字符串中给定字符的代码点编号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆