是“SET CHARACTER SET utf8"吗?必要的? [英] Is "SET CHARACTER SET utf8" necessary?

查看:31
本文介绍了是“SET CHARACTER SET utf8"吗?必要的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重写我们的数据库类(基于 PDO),并被困在这个问题上.在 PHP 和 MySQL 中使用 UTF-8 时,我被教导要同时使用 SET NAMES utf8SET CHARACTER SET utf8.

I´m rewritting our database class (PDO based), and got stuck at this. I´ve been taught to both use SET NAMES utf8 and SET CHARACTER SET utf8 when working with UTF-8 in PHP and MySQL.

在 PDO 中,我现在想使用 PDO::MYSQL_ATTR_INIT_COMMAND 参数,但它只支持一个查询.

In PDO I now want to use the PDO::MYSQL_ATTR_INIT_COMMAND parameter, but it only supports one query.

是否需要SET CHARACTER SET utf8?

推荐答案

在使用 SET NAMES utf8 之后使用 SET CHARACTER SET utf8 实际上会重置 character_set_connectioncollat​​ion_connection
@@character_set_database@@collat​​ion_database 分别.

Using SET CHARACTER SET utf8 after using SET NAMES utf8 will actually reset the character_set_connection and collation_connection to
@@character_set_database and @@collation_database respectively.

手册指出

  • SET NAMES x 等价于

SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

  • SET CHARACTER SET x等价于

    SET character_set_client = x;
    SET character_set_results = x;
    SET collation_connection = @@collation_database;
    

  • SET collat​​ion_connection = x 也在内部执行 SET character_set_connection = <<character_set_of_collat​​ion_x>>SET character_set_connection = x 也在内部执行执行 SET collat​​ion_connection = <<default_collat​​ion_of_character_set_x.

    whereas SET collation_connection = x also internally executes SET character_set_connection = <<character_set_of_collation_x>> and SET character_set_connection = x internally also executes SET collation_connection = <<default_collation_of_character_set_x.

    所以本质上,您将 character_set_connection 重置为 @@character_set_database 并将 collat​​ion_connection 重置为 @@collat​​ion_database.手册解释了这些变量的用法:

    So essentially you're resetting character_set_connection to @@character_set_database and collation_connection to @@collation_database. The manual explains the usage of these variables:

    服务器应该使用什么字符集将语句翻译为 after收到了吗?

    为此,服务器使用character_set_connection 和collat​​ion_connection 系统变量.它转换由发送的语句客户端从 character_set_client 到character_set_connection(除了字符串字面量介绍人,如 _latin1 或 _utf8).collat​​ion_connection 对文字字符串的比较.为了字符串与列的比较值,collat​​ion_connection 不很重要,因为列有自己的整理,具有更高的排序规则优先.

    For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

    总而言之,MySQL 用来处理查询及其结果的编码/转码过程是一个多步骤的事情:

    To sum this up, the encoding/transcoding procedure MySQL uses to process the query and its results is a multi-step-thing:

    1. MySQL 将传入查询视为在 character_set_client 中编码.
    2. MySQL 将语句从 character_set_client 转码为 character_set_connection
    3. 当比较字符串值与列值时,MySQL 将字符串值从 character_set_connection 转码为给定数据库列的字符集,并使用列排序规则进行排序和比较.
    4. MySQL 构建以 character_set_results 编码的结果集(这包括结果数据以及结果元数据,例如列名等)
    1. MySQL treats the incoming query as being encoded in character_set_client.
    2. MySQL transcodes the statement from character_set_client into character_set_connection
    3. when comparing string values to column values MySQL transcodes the string value from character_set_connection into the character set of the given database column and uses the column collation to do sorting and comparison.
    4. MySQL builds up the result set encoded in character_set_results (this includes result data as well as result metadata such as column names and so on)

    因此,SET CHARACTER SET utf8 可能不足以提供完整的 UTF-8 支持.考虑 latin1 的默认数据库字符集和用 utf8-charset 定义的列,并执行上述步骤.由于 latin1 无法覆盖 UTF-8 可以覆盖的所有字符,因此您可能会在步骤 3 中丢失字符信息.

    So it could be the case that a SET CHARACTER SET utf8 would not be sufficient to provide full UTF-8 support. Think of a default database character set of latin1 and columns defined with utf8-charset and go through the steps described above. As latin1 cannot cover all the characters that UTF-8 can cover you may lose character information in step 3.

    • 步骤3:鉴于您的查询以 UTF-8 编码并且包含无法用 latin1 表示的字符,这些字符将在从 utf8 转码到 latin1(默认数据库字符集)时丢失,使您的查询失败.
    • Step 3: Given that your query is encoded in UTF-8 and contains characters that cannot be represented with latin1, these characters will be lost on transcoding from utf8 to latin1 (the default database character set) making your query fail.

    所以我认为可以肯定地说 SET NAMES ... 是处理字符集问题的正确方法.尽管我可能会补充说,正确设置 MySQL 服务器变量(所有必需的变量都可以在 my.cnf 中静态设置)可以将您从每次连接所需的额外查询的性能开销中解放出来.

    So I think it's safe to say that SET NAMES ... is the correct way to handle character set issues. Even though I might add that setting up your MySQL server variables correctly (all the required variables can be set statically in your my.cnf) frees you from the performance overhead of the extra query required on every connect.

    这篇关于是“SET CHARACTER SET utf8"吗?必要的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆