UTF8编码问题 - 有很好的例子 [英] UTF8 Encoding problem - With good examples
问题描述
我有以下字符编码问题,不知何故我已经设法保存数据与不同的字符编码到我的数据库(UTF8)下面的代码和输出显示2个示例字符串及其如何输出。其中1个将需要更改为UTF8,另一个已经是。
我应该如何检查是否应该对字符串进行编码?例如
我需要每个字符串正确输出,那么如何检查它是否已经是utf8或者是否需要转换?
我使用PHP 5.2,mysql myisam表:
CREATE TABLE IF NOT EXISTS`entities`(
... 。
`title` varchar(255)NOT NULL
....
)ENGINE = MyISAM DEFAULT CHARSET = utf8;
<?php
$ text = $ entity ['Entity'] ['title'];
echo'Original:',$ text。< br />;
echo'UTF8 Encode:',utf8_encode($ text)。< br />;
echo'UTF8 Decode:',utf8_decode($ text)。< br />;
echo'TRANSLIT:',iconv(ISO-8859-1,UTF-8 // TRANSLIT,$ text)。< br />;
echo'IGNORE TRANSLIT:',iconv(ISO-8859-1,UTF-8 // IGNORE // TRANSLIT,$ text)。< br />;
echo'IGNORE:',iconv(ISO-8859-1,UTF-8 // IGNORE,$ text)< br />;
echo'Plain:',iconv(ISO-8859-1,UTF-8,$ text)< br />;
?>
输出1:
原文:FranceTélécom
UTF8编码:法国Té©lécom
UTF8解码:法国T l com
TRANSLIT:法国Télécom
IGNORE TRANSLIT:法国Télécom
IGNORE:法国Télécom
普通:法国Télécom
输出2: ###
原文:Cond Nast出版物
UTF8编码:CondéNast出版物
UTF8解码:条目出版物
TRANSLIT:CondéNast出版物
IGNORE TRANSLIT :CondéNast出版物
IGNORE:CondéNast出版物
普通版:CondéNast出版物
感谢您的时间在这一个。
更新: echo strlen($ string)。|.strlen(utf8_encode($ string))。
echo(strlen($ string)!== strlen(utf8_encode($ string)))? $ string:utf8_encode($ string);
echo< br />;
echo strlen($ string)。|.strlen(utf8_decode($ string))。
echo(strlen($ string)!== strlen(utf8_decode($ string)))? $ string:utf8_decode($ string);
echo< br />;
23 | 24 |Cond Nast出版物
23 | 21 |Cond Nast出版物
16 | 20 |法国Télécom
16 | 14 |法国Télécom
这可能是 mb_detect_encoding()
函数。
在我有限的经验,它不是100%可靠,当用作一个通用的编码嗅探器 - 它检查某些字符和字节值的存在做一个有根据的猜测 - 但在这个狭窄的情况下(它将需要只区分UTF-8和ISO-8859-1),它应该工作。
;?php
$ text = $ entity ['Entity'] ['title'];
echo'Original:',$ text。< br />;
$ enc = mb_detect_encoding($ text,UTF-8,ISO-8859-1);
echo'Detected encoding'。$ enc。< br />;
echo'修正结果:'.iconv($ enc,UTF-8,$ text)。< br />;
?>
对于不包含特殊字符的字符串,可能会得到不正确的结果,但这不是问题。
I have the following character encoding issue, somehow I have managed to save data with different character encoding into my database (UTF8) The code and outputs below show 2 sample strings and how they output. 1 of them would need to be changed to UTF8 and the other already is.
How do/should I go about checking if I should encode the string or not? e.g. I need each string to be outputted correctly, so how do I check if it is already utf8 or whether it needs to be converted?
I am using PHP 5.2, mysql myisam tables:
CREATE TABLE IF NOT EXISTS `entities` (
....
`title` varchar(255) NOT NULL
....
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
<?php
$text = $entity['Entity']['title'];
echo 'Original : ', $text."<br />";
echo 'UTF8 Encode : ', utf8_encode($text)."<br />";
echo 'UTF8 Decode : ', utf8_decode($text)."<br />";
echo 'TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//TRANSLIT", $text)."<br />";
echo 'IGNORE TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//IGNORE//TRANSLIT", $text)."<br />";
echo 'IGNORE : ', iconv("ISO-8859-1", "UTF-8//IGNORE", $text)."<br />";
echo 'Plain : ', iconv("ISO-8859-1", "UTF-8", $text)."<br />";
?>
Output 1:
Original : France Télécom
UTF8 Encode : France Télécom
UTF8 Decode : France T�l�com
TRANSLIT : France Télécom
IGNORE TRANSLIT : France Télécom
IGNORE : France Télécom
Plain : France Télécom
Output 2:###
Original : Cond� Nast Publications
UTF8 Encode : Condé Nast Publications
UTF8 Decode : Cond?ast Publications
TRANSLIT : Condé Nast Publications
IGNORE TRANSLIT : Condé Nast Publications
IGNORE : Condé Nast Publications
Plain : Condé Nast Publications
Thanks for you time on this one. Character encoding and I don't get on very well!
UPDATE:
echo strlen($string)."|".strlen(utf8_encode($string))."|";
echo (strlen($string)!==strlen(utf8_encode($string))) ? $string : utf8_encode($string);
echo "<br />";
echo strlen($string)."|".strlen(utf8_decode($string))."|";
echo (strlen($string)!==strlen(utf8_decode($string))) ? $string : utf8_decode($string);
echo "<br />";
23|24|Cond� Nast Publications
23|21|Cond� Nast Publications
16|20|France Télécom
16|14|France Télécom
This may be a job for the mb_detect_encoding()
function.
In my limited experience with it, it's not 100% reliable when used as a generic "encoding sniffer" - It checks for the presence of certain characters and byte values to make an educated guess - but in this narrow case (it'll need to distinguish just between UTF-8 and ISO-8859-1 ) it should work.
<?php
$text = $entity['Entity']['title'];
echo 'Original : ', $text."<br />";
$enc = mb_detect_encoding($text, "UTF-8,ISO-8859-1");
echo 'Detected encoding '.$enc."<br />";
echo 'Fixed result: '.iconv($enc, "UTF-8", $text)."<br />";
?>
you may get incorrect results for strings that do not contain special characters, but that is not a problem.
这篇关于UTF8编码问题 - 有很好的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!