utf8mb4_unicode_ci与utf8mb4_bin [英] utf8mb4_unicode_ci vs utf8mb4_bin

查看：1676 发布时间：2020/5/15 3:38:53 php mysql utf-8 character-encoding

本文介绍了utf8mb4_unicode_ci与utf8mb4_bin的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以首先让我们看看我是否正确:

字符集是一组符号和编码.排序规则是一组用于比较字符集中的字符的规则.

A charset is a set of symbols and encodings. A collation is a set of rules for comparing characters in a charset.

我应该使用utf8mb4，因为mysql utf8是一个欺诈，最多3个字节，而不是PHP中真正的最多4个字节的实际utf8字符集.

I should use utf8mb4 because mysql utf8 is a fraud, up-to-3-bytes and not the true up-to-4-bytes real utf8 charset in PHP for example.

因此，utf8mb4是一个字符集，而utf8mb4_unicode_ci/utf8mb4_bin是他的许多可用归类中的2个.

As such, utf8mb4 is a charset and utf8mb4_unicode_ci/utf8mb4_bin are 2 of his many differents available collations.

utf8_unicode_ci进行不区分大小写的比较和其他特殊比较(例如，我听说它弄乱了法语中的所有重音符号). utf8_bin区分大小写，因为它会比较字符的二进制值.

utf8_unicode_ci do case-insensitive comparison and other special comparisons ( I heard it messes up with all the accents in french for example ) . utf8_bin is case-sensitive because it compares the binary values of the character.

现在是问题:

例如，如果我想使用utf8mb4_unicode_ci允许使用区分大小写的登录名，则必须执行以下操作:

If for example I want to allow Case-Sensitive login names using utf8mb4_unicode_ci I will have to do things like:

SELECT name FROM table WHERE BINARY name = 'MyNaMEiSFUlloFUPPERCases';

例如，如果我想允许使用utf8mb4_bin进行不区分大小写的搜索，我将必须执行以下操作:

If for example I want to allow Case-insensitive search using utf8mb4_bin I will have to do things like:

SELECT name FROM table WHERE LOWER(name) LIKE '%myname%'

那么哪个更好?我听到关于utf8_unicode_ci和重音符号/其他特殊字符的坏消息怎么办?

So which one is better ? What about the bad things i hear about utf8_unicode_ci and the accents/other special characters ?

谢谢:)

推荐答案

您搞对了"吗?是的，除了我认为utf8mb4_unicode_520_ci中的法语口音是正确"的.

Did you "get things right"? Yes, Except that I think that French accents are 'correctly' compared in utf8mb4_unicode_520_ci.

您的两个SELECTs都将进行全表扫描，因此效率低下.原因是您要覆盖排序规则(对于#1)或在函数中隐藏列(对于LOWER，对于#2)或使用前导通配符(LIKE %...).

Your two SELECTs will both to a full table scan, thereby be inefficient. The reason is that you are overriding the collation (for #1) or hiding the column in a function (LOWER, for #2) or using a leading wildcard (LIKE %...).

如果您想提高效率，请将name声明为COLLATION utf8mb4_bin并简单地执行WHERE name = ....

If you want it to be efficient, declare name to be COLLATION utf8mb4_bin and do simply WHERE name = ....

您认为其中的一些等价方式和顺序对于法语不正确"吗?

Do you think some of these equivalences and orderings are 'incorrect' for French?

A=a=ª=À=Á=Â=Ã=Ä=Å=à=á=â=ã=ä=å=Ā=ā=Ą=ą  Aa  ae=Æ=æ  az  B=b  C=c=Ç=ç=Ć=ć=Č=č  ch  cz
D=d=Ð=ð=Ď=ď  dz  E=e=È=É=Ê=Ë=è=é=ê=ë=Ē=ē=Ĕ=ĕ=Ė=ė=Ę=ę=Ě=ě  F=f  fz  ƒ  G=g=Ğ=ğ=Ģ=ģ
gz  H=h  hz  I=i=Ì=Í=Î=Ï=ì=í=î=ï=Ī=ī=Į=į=İ  ij=ĳ  iz  ı  J=j  K=k=Ķ=ķ
L=l=Ĺ=ĺ=Ļ=ļ=Ł=ł  lj=Ǉ=ǈ=ǉ  ll  lz  M=m  N=n=Ñ=ñ=Ń=ń=Ņ=ņ=Ň=ň  nz
O=o=º=Ò=Ó=Ô=Õ=Ö=Ø=ò=ó=ô=õ=ö=ø  oe=Œ=œ  oz  P=p  Q=q  R=r=Ř=ř  S=s=Ś=ś=Ş=ş=Š=š  sh
ss=ß  sz  T=t=Ť=ť  TM=tm=™  tz  U=u=Ù=Ú=Û=Ü=ù=ú=û=ü=Ū=ū=Ů=ů=Ų=ų  ue  uz  V=v  W=w  X=x
Y=y=Ý=ý=ÿ=Ÿ  yz  Z=z=Ź=ź=Ż=ż=Ž=ž  zh  zz  Þ=þ  µ

更多utf8排序规则. 8.0和utf8mb4排序规则.

"520"(较新)版本未将Æ，Ð，Ł和Ø视为单独的字母"，以及其他可能的东西.

The "520" (newer) version by not treating Æ, Ð, Ł, and Ø as a separate 'letters', and perhaps other things.

这篇关于utf8mb4_unicode_ci与utf8mb4_bin的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

utf8mb4_unicode_ci与utf8mb4_bin [英] utf8mb4_unicode_ci vs utf8mb4_bin

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

utf8mb4_unicode_ci与utf8mb4_bin [英] utf8mb4_unicode_ci vs utf8mb4_bin

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭