修复损坏的UTF-8编码 [英] Fixing broken UTF-8 encoding

查看:84
本文介绍了修复损坏的UTF-8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在修复一些错误的UTF-8编码.我目前正在使用PHP 5和MySQL.

在我的数据库中,我有一些不良编码的实例,它们的打印方式如下:ƒ®

  • 数据库排序规则为utf8_general_ci
  • PHP使用了正确的UTF-8标头
  • Notepad ++设置为使用不带BOM的UTF-8
  • 数据库管理在 phpMyAdmin
  • 中处理
  • 并非所有重音符号都坏了

我需要某种功能来帮助我将ƒ,®,ü和其他类似的实例映射到其适当的带有重音符号的UTF-8字符.

解决方案

过去,我不得不尝试修复"许多UTF8损坏的情况,不幸的是,这绝非易事,而且常常是不可能的. >

除非您能确切确定它是如何破裂的,而且总是以完全相同的方式破裂,否则将很难撤消"损害.

如果您想消除损害,最好的选择是开始编写一些示例代码,在该示例代码中,尝试对mb_convert_encoding()进行多次调用以查看是否可以找到'from'和'来修复您的数据.最后,通常最好不要再因为涉及到痛苦的程度而烦恼修复旧数据,而只是去解决将来的问题.

但是,在执行此操作之前,您需要确保首先解决导致此问题的所有问题.您已经提到过,数据库表的排序规则和编辑器已正确设置.但是,还有更多地方需要检查以确保所有内容都正确地使用UTF-8:

  • 确保您将HTML用作UTF-8:
    • header("Content-Type:text/html; charset = utf-8");
  • 将您的PHP默认字符集更改为utf-8:
    • ini_set("default_charset",'utf-8');
  • 如果您的数据库始终不能在utf-8中通信,那么您可能需要基于每个连接告诉它以确保它处于utf-8模式,在MySQL中,您可以通过发出以下命令来做到这一点:
    • 字符集utf8
  • 您可能需要告诉您的网络服务器始终尝试使用UTF8进行对话,在Apache中,此命令为:
    • AddDefaultCharset UTF-8
  • 最后,您需要始终确保您使用的是正确的UTF-8投诉的PHP函数.这意味着始终使用 mb_ * 样式的多字节感知"字符串函数.这也意味着在调用诸如htmlspecialchars()之类的函数时,您必须在末尾包含适当的'utf-8'charset参数,以确保其不会对它们进行错误编码.

如果您错过了整个过程中的任何一步,则编码可能会被破坏并且会出现问题.一旦您进入执行utf-8的槽",这一切便成为第二天性.当然,PHP6应该是来自getgo的完全unicode投诉,这将使很多此类工作变得更容易(希望如此)

I am in the process of fixing some bad UTF-8 encoding. I am currently using PHP 5 and MySQL.

In my database I have a few instances of bad encodings that print like: î

  • The database collation is utf8_general_ci
  • PHP is using a proper UTF-8 header
  • Notepad++ is set to use UTF-8 without BOM
  • database management is handled in phpMyAdmin
  • not all cases of accented characters are broken

I need some sort of function that will help me map the instances of î, í, ü and others like it to their proper accented UTF-8 characters.

解决方案

I've had to try to 'fix' a number of UTF8 broken situations in the past, and unfortunately it's never easy, and often rather impossible.

Unless you can determine exactly how it was broken, and it was always broken in that exact same way, then it's going to be hard to 'undo' the damage.

If you want to try to undo the damage, your best bet would be to start writing some sample code, where you attempt numerous variations on calls to mb_convert_encoding() to see if you can find a combination of 'from' and 'to' that fixes your data. In the end, it's often best to not even bother worrying about fixing the old data because of the pain levels involved, but instead to just fix things going forward.

However, before doing this, you need to make sure that you fix everything that is causing this issue in the first place. You've already mentioned that your DB table collation and editors are set properly. But there are more places where you need to check to make sure that everything is properly UTF-8:

  • Make sure that you are serving your HTML as UTF-8:
    • header("Content-Type: text/html; charset=utf-8");
  • Change your PHP default charset to utf-8:
    • ini_set("default_charset", 'utf-8');
  • If your database doesn't ALWAYS talk in utf-8, then you may need to tell it on a per connection basis to ensure it's in utf-8 mode, in MySQL you do that by issuing:
    • charset utf8
  • You may need to tell your webserver to always try to talk in UTF8, in Apache this command is:
    • AddDefaultCharset UTF-8
  • Finally, you need to ALWAYS make sure that you are using PHP functions that are properly UTF-8 complaint. This means always using the mb_* styled 'multibyte aware' string functions. It also means when calling functions such as htmlspecialchars(), that you include the appropriate 'utf-8' charset parameter at the end to make sure that it doesn't encode them incorrectly.

If you miss up on any one step through your whole process, the encoding can be mangled and problems arise. Once you get in the 'groove' of doing utf-8 though, this all becomes second nature. And of course, PHP6 is supposed to be fully unicode complaint from the getgo, which will make lots of this easier (hopefully)

这篇关于修复损坏的UTF-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆