UTF8编码问题 - 有很好的例子 [英] UTF8 Encoding problem - With good examples

查看：134 发布时间：2016/11/19 13:05:25 php mysql utf-8 character-encoding

本文介绍了UTF8编码问题 - 有很好的例子的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下字符编码问题，不知何故我已经设法保存数据与不同的字符编码到我的数据库（UTF8）下面的代码和输出显示2个示例字符串及其如何输出。其中1个将需要更改为UTF8，另一个已经是。

我应该如何检查是否应该对字符串进行编码？例如
我需要每个字符串正确输出，那么如何检查它是否已经是utf8或者是否需要转换？

我使用PHP 5.2，mysql myisam表：

  CREATE TABLE IF NOT EXISTS`entities`（
 ... 。
`title` varchar（255）NOT NULL 
 .... 
）ENGINE = MyISAM DEFAULT CHARSET = utf8; 
 
<？php 
 $ text = $ entity ['Entity'] ['title']; 
 echo'Original：'，$ text。< br />; 
 echo'UTF8 Encode：'，utf8_encode（$ text）。< br />; 
 echo'UTF8 Decode：'，utf8_decode（$ text）。< br />; 
 echo'TRANSLIT：'，iconv（ISO-8859-1，UTF-8 // TRANSLIT，$ text）。< br />; 
 echo'IGNORE TRANSLIT：'，iconv（ISO-8859-1，UTF-8 // IGNORE // TRANSLIT，$ text）。< br />; 
 echo'IGNORE：'，iconv（ISO-8859-1，UTF-8 // IGNORE，$ text）< br />; 
 echo'Plain：'，iconv（ISO-8859-1，UTF-8，$ text）< br />; 
？>

输出1：

 原文：FranceTélécom
 UTF8编码：法国Té©lÃ©com 
 UTF8解码：法国T l com
 TRANSLIT：法国TÃ©lÃ©com 
 IGNORE TRANSLIT：法国TÃ©lÃ©com 
 IGNORE：法国TÃ©lÃ©com 
普通：法国TÃ©lÃ©com

输出2： ###

 原文：Cond Nast出版物
 UTF8编码：CondéNast出版物
 UTF8解码：条目出版物
 TRANSLIT：CondéNast出版物
 IGNORE TRANSLIT ：CondéNast出版物
 IGNORE：CondéNast出版物
普通版：CondéNast出版物

感谢您的时间在这一个。

更新：

  echo strlen（$ string）。|.strlen（utf8_encode（$ string））。 
 echo（strlen（$ string）！== strlen（utf8_encode（$ string）））？ $ string：utf8_encode（$ string）; 
 echo< br />; 
 echo strlen（$ string）。|.strlen（utf8_decode（$ string））。 
 echo（strlen（$ string）！== strlen（utf8_decode（$ string）））？ $ string：utf8_decode（$ string）; 
 echo< br />; 
 
 23 | 24 |Cond Nast出版物
 23 | 21 |Cond Nast出版物
 
 16 | 20 |法国Télécom
 16 | 14 |法国Télécom

解决方案

这可能是 mb_detect_encoding（） 函数。

在我有限的经验，它不是100％可靠，当用作一个通用的编码嗅探器 - 它检查某些字符和字节值的存在做一个有根据的猜测 - 但在这个狭窄的情况下（它将需要只区分UTF-8和ISO-8859-1），它应该工作。

  ;？php 
 $ text = $ entity ['Entity'] ['title']; 
 
 echo'Original：'，$ text。< br />; 
 $ enc = mb_detect_encoding（$ text，UTF-8，ISO-8859-1）; 
 
 echo'Detected encoding'。$ enc。< br />; 
 
 echo'修正结果：'.iconv（$ enc，UTF-8，$ text）。< br />; 
 
？>

对于不包含特殊字符的字符串，可能会得到不正确的结果，但这不是问题。

I have the following character encoding issue, somehow I have managed to save data with different character encoding into my database (UTF8) The code and outputs below show 2 sample strings and how they output. 1 of them would need to be changed to UTF8 and the other already is.

How do/should I go about checking if I should encode the string or not? e.g. I need each string to be outputted correctly, so how do I check if it is already utf8 or whether it needs to be converted?

I am using PHP 5.2, mysql myisam tables:

CREATE TABLE IF NOT EXISTS `entities` (
  ....
  `title` varchar(255) NOT NULL
  ....
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

<?php
$text = $entity['Entity']['title'];
echo 'Original : ', $text."<br />";
echo 'UTF8 Encode : ', utf8_encode($text)."<br />";
echo 'UTF8 Decode : ', utf8_decode($text)."<br />";
echo 'TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//TRANSLIT", $text)."<br />";
echo 'IGNORE TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//IGNORE//TRANSLIT", $text)."<br />";
echo 'IGNORE   : ', iconv("ISO-8859-1", "UTF-8//IGNORE", $text)."<br />";
echo 'Plain    : ', iconv("ISO-8859-1", "UTF-8", $text)."<br />";
?>

Output 1:

Original : France Télécom
UTF8 Encode : France TÃ©lÃ©com
UTF8 Decode : France T�l�com
TRANSLIT : France TÃ©lÃ©com
IGNORE TRANSLIT : France TÃ©lÃ©com
IGNORE : France TÃ©lÃ©com
Plain : France TÃ©lÃ©com

Output 2:###

Original : Cond� Nast Publications
UTF8 Encode : Condé Nast Publications
UTF8 Decode : Cond?ast Publications
TRANSLIT : Condé Nast Publications
IGNORE TRANSLIT : Condé Nast Publications
IGNORE : Condé Nast Publications
Plain : Condé Nast Publications

Thanks for you time on this one. Character encoding and I don't get on very well!

UPDATE:

echo strlen($string)."|".strlen(utf8_encode($string))."|";
echo (strlen($string)!==strlen(utf8_encode($string))) ? $string : utf8_encode($string);
echo "<br />";
echo strlen($string)."|".strlen(utf8_decode($string))."|";
echo (strlen($string)!==strlen(utf8_decode($string))) ? $string : utf8_decode($string);
echo "<br />";

23|24|Cond� Nast Publications
23|21|Cond� Nast Publications

16|20|France Télécom
16|14|France Télécom

解决方案

This may be a job for the mb_detect_encoding() function.

In my limited experience with it, it's not 100% reliable when used as a generic "encoding sniffer" - It checks for the presence of certain characters and byte values to make an educated guess - but in this narrow case (it'll need to distinguish just between UTF-8 and ISO-8859-1 ) it should work.

<?php
$text = $entity['Entity']['title'];

echo 'Original : ', $text."<br />";
$enc = mb_detect_encoding($text, "UTF-8,ISO-8859-1");

echo 'Detected encoding '.$enc."<br />";

echo 'Fixed result: '.iconv($enc, "UTF-8", $text)."<br />";

?>

you may get incorrect results for strings that do not contain special characters, but that is not a problem.

这篇关于UTF8编码问题 - 有很好的例子的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

UTF8编码问题 - 有很好的例子 [英] UTF8 Encoding problem - With good examples

问题描述

输出1：

输出2： ###

Output 1:

Output 2:###

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

UTF8编码问题 - 有很好的例子 [英] UTF8 Encoding problem - With good examples

问题描述

输出1：

输出2： ###

Output 1:

Output 2:###

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭