UTF8工作流程PHP，MySQL总结 [英] UTF8 workflow PHP, MySQL summarized

查看：78 发布时间：2020/5/15 0:47:11 php mysql unicode utf-8 workflow

本文介绍了UTF8工作流程PHP，MySQL总结的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为具有完全不同的字母的国际客户工作，因此我试图最终获得PHP和MySQL之间完整工作流的概述，以确保正确插入所有字符编码.我已经阅读了一堆关于此的教程，但仍然有疑问(有很多东西要学习)，并认为我可能会把所有内容放到这里并问.

I am working for international clients who have all very different alphabets and so I am trying to finally get an overview of a complete workflow between PHP and MySQL that would ensure all character encodings to be inserted correctly. I have read a bunch of tutorials on this but still have questions(there is much to learn) and thought I might just put it all together here and ask.

PHP

header('Content-Type:text/html; charset=UTF-8');
mb_internal_encoding('UTF-8');

HTML

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<form accept-charset="UTF-8"> .. </form>

(尽管后者是可选的，但这是一个建议，但我相信我宁愿建议不要做任何事情)

MySQL

CREATE database_name DEFAULT CHARACTER SET utf8;或ALTER database_name DEFAULT CHARACTER SET utf8;和/或使用utf8_general_ci作为MySQL连接排序规则.

CREATE database_name DEFAULT CHARACTER SET utf8; or ALTER database_name DEFAULT CHARACTER SET utf8; and/or use utf8_general_ci as MySQL connection collation.

(这是重要注意事项在这里，如果使用varchar，这会增加数据库的大小)

(it is important to note here that this will increase the database size if it uses varchar)

连接

mysql_query("SET NAMES 'utf8'");
mysql_query("SET CHARACTER_SET utf8");

业务逻辑

使用 mb_detect_encoding() 检测是否为UTF8并使用 ivon() 进行转换.
验证UTF8和UTF16的序列太长

detect if not UTF8 with mb_detect_encoding() and convert with ivon().
validating overly long sequences of UTF8 and UTF16

$body=preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]|(?<=^|[\x00-\x7F])[\x80-\xBF]+|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/','�',$body);
$body=preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $body);

问题

是mb_internal_encoding('UTF-8')所必需的，如果是的话，这是否意味着我必须使用所有多字节函数而不是像mb_substr()这样的核心函数，而不是substr()?

is mb_internal_encoding('UTF-8') necessary in PHP 5.3 and higher and if so does this mean I have to use all multi byte functions instead of its core functions like mb_substr() instead of substr()?

是否仍然需要检查格式错误的输入字符串?如果是，那么可靠的函数/类是什么呢?我可能不想剥离不良数据，对音译不了解.

is it still necessary to check for malformed input stings and if so what is a reliable function/class to do so? I possibly do not want to strip bad data and don't know enough about transliteration.

应该真的是utf8_general_ci还是utf8_bin?

上述工作流程中是否缺少某些内容?

is there something missing in the above workflow?

来源:

http://coding.smashingmagazine.com/2012/06/06/all-about-unicode-utf8-character-sets/  
http://webcollab.sourceforge.net/unicode.html  
http://stackoverflow.com/a/3742879/1043231  
http://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql/  
http://akrabat.com/php/utf8-php-and-mysql/

UTF8工作流程PHP，MySQL总结 [英] UTF8 workflow PHP, MySQL summarized

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

UTF8工作流程PHP，MySQL总结 [英] UTF8 workflow PHP, MySQL summarized

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭