检查字符串是否被编码为UTF-8 [英] Check to see if a string is encoded as UTF-8
问题描述
function seem_utf8($ str){
$ length = strlen($ str); ($ i = 0; $ i <$ length; $ i ++)
{
$ c = ord($ str [$ i]);
if($ c <0x80)$ n = 0; #0bbbbbbb
elseif(($ c& 0xE0)== 0xC0)$ n = 1; #110bbbbb
elseif(($ c& 0xF0)== 0xE0)$ n = 2; #1110bbbb
elseif(($ c& 0xF8)== 0xF0)$ n = 3; #11110bbb
elseif(($ c& 0xFC)== 0xF8)$ n = 4; #111110bb
elseif(($ c& 0xFE)== 0xFC)$ n = 5; #1111110b
else return false; #不匹配任何模型
for($ j = 0; $ j <$ n; $ j ++){#n个匹配10bbbbbb的字节跟随?
if((++ $ i == $ length)||((ord($ str [$ i])& 0xC0)!= 0x80))
return false;
}
}
返回true;
}
我从Wordpress得到这个代码,我不太了解这个,但我想知道这个功能究竟是什么。
如果有任何一个人知道请帮助我吗?
我需要关于上述代码的清晰想法。如果逐行解释会更有帮助。
我用两种方法来检查字符串是否是utf-8(取决于情况):
mb_internal_encoding('UTF-8'); //总是需要在mb_函数之前,检查下面的笔记
if(mb_strlen($ string)!= strlen($ string)){
///不是单字节
}
- 或 -
if(preg_match('!\S!u',$ string)){
// utf8
}
对于mb_internal_encoding - 由于我在php(5.3版本(5.3没有测试)中的一些未知的),将编码作为参数传递给mb_函数在使用mb_函数之前,不需要设置内部编码。
function seems_utf8($str) { $length = strlen($str); for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; # 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b else return false; # Does not match any model for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) return false; } } return true; }
I got this code from Wordpress, I don't know much about this, but I would like to know what exactly happing in that function.
If any one know please help me out?
I need the clear idea about the above code. If line by line explanation will be more helpful.
解决方案I use two ways to check if string is utf-8 (depending on the case):
mb_internal_encoding('UTF-8'); // always needed before mb_ functions, check note below if (mb_strlen($string) != strlen($string)) { /// not single byte }
-- OR --
if (preg_match('!\S!u', $string)) { // utf8 }
For the mb_internal_encoding - due to some unknown to me bug in php (version 5.3- (haven't tested it on 5.3)) passing the encoding as a parameter to the mb_ function doesn't work and the internal encoding needs to be set before any use of mb_ functions.
这篇关于检查字符串是否被编码为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!