将127以上的Unicode字符转换为十进制 [英] Convert unicode characters above 127 to decimal

查看:323
本文介绍了将127以上的Unicode字符转换为十进制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
如何使用php将文本转换为\ u0054 \ u0068 \ u0069 \ u0073之类的Unicode代码点?

Possible Duplicate:
How to convert text to unicode code point like \u0054\u0068\u0069\u0073 using php?

我正在尝试将所有无法容纳7位ANSI字符的字符转换为转义形式\uN,其中N是其十进制值.这是我想出的:

I'm trying to convert all characters that can't fit into a 7-bit ANSI character into an escaped form, \uN, where N is its decimal value. Here's what I've come up with:

private static function escape($str) {
    return preg_replace_callback('~[\\x{007F}-\\x{FFFF}]~u',function($m){return '\\u'.ord($m[0]);},$str);
}

我已经尝试过使用Gamma之类的字符

I've tried it with characters like Gamma,

echo self::escape('Γ');

但是我退回了\u206而不是\u915.我不知道我要去哪里错了...想法?

But I get \u206 back out instead of \u915. I can't figure out where I'm going wrong... ideas?

实际上,似乎是ord()函数没有给我值或我想要的,还是我的.php文件上的编码错误?

Actually, it appears that either the ord() function doesn't give me the value or I want, or maybe the encoding on my .php file is wrong?

推荐答案

我必须确切地刷新UTF-8的工作方式,但是这里有一个utf8_ord()函数和一个补充的utf8_chr(). chr()从我的回答这里.

I had to refresh my memory on exactly how UTF-8 works, but here is a utf8_ord() function, and a complementing utf8_chr(). The chr() is lifted pretty much verbatim from my answer here.

function utf8_ord ($chr)
{
    $bytes = array_values(unpack('C*', $chr));

    switch (count($bytes)) {
        case 1:
            return $bytes[0] < 0x80
                ? $bytes[0]
                : false;
        case 2:
            return ($bytes[0] & 0xE0) === 0xC0 && ($bytes[1] & 0xC0) === 0x80
                ? (($bytes[0] & 0x1F) << 6) | ($bytes[1] & 0x3F)
                : false;
        case 3:
            return ($bytes[0] & 0xF0) === 0xE0 && ($bytes[1] & 0xC0) === 0x80 && ($bytes[2] & 0xC0) === 0x80 
                ? (($bytes[0] & 0x0F) << 12) | (($bytes[1] & 0x3F) << 6) | ($bytes[2] & 0x3F)
                : false;
        case 4:
            return ($bytes[0] & 0xF8) === 0xF0 && ($bytes[1] & 0xC0) === 0x80 && ($bytes[2] & 0xC0) === 0x80 && ($bytes[3] & 0xC0) === 0x80
                ? (($bytes[0] & 0x07) << 18) | (($bytes[1] & 0x3F) << 12) | (($bytes[2] & 0x3F) << 6) | ($bytes[3] & 0x3F)
                : false;
    }

    return false;
}

function utf8_chr ($ord)
{
    switch (true) {
        case $ord < 0x80:
            return pack('C*', $ord & 0x7F);
        case $ord < 0x0800:
            return pack('C*', (($ord & 0x07C0) >> 6) | 0xC0, ($ord & 0x3F) | 0x80);
        case $ord < 0x010000:
            return pack('C*', (($ord & 0xF000) >> 12) | 0xE0, (($ord & 0x0FC0) >> 6) | 0x80, ($ord & 0x3F) | 0x80);
        case $ord < 0x110000:
            return pack('C*', (($ord & 0x1C0000) >> 18) | 0xF0, (($ord & 0x03F000) >> 12) | 0x80, (($ord & 0x0FC0) >> 6) | 0x80, ($ord & 0x3F) | 0x80);
    }

    return false;
}

这篇关于将127以上的Unicode字符转换为十进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆