PHP:强调Unicode的字符和变音符号 [英] PHP: Unicode accentuated char and diacritics

查看：56 发布时间：2021/5/4 19:15:10 php unicode encoding tinymce

本文介绍了PHP:强调Unicode的字符和变音符号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我们的网站上，某些Mac用户在将文本从PDF文件复制粘贴到TextArea(由TinyMCE处理)时遇到麻烦.所有突出的字符都已损坏，例如对于é来说是 e?，对于î来说是 i?等等.我无法在Windows计算机上重现此问题.

In our website, some Mac users have troubles when they copy-paste text from PDF files into a TextArea (handled by TinyMCE). All accentuated char are corrupted, and became for example e? for a é, i? for a î, etc. I cannot reproduce this problem with a Windows computer.

当我将TextArea的内容写到文件上(在将其插入数据库之前)时，我发现初始的é在视觉上不同于传统的é(在Vim上，请参见下文).

When I wrote the content of the TextArea on a file (before inserting it in the database), I just discovered that the initial é is visually different that a traditionnal é (on Vim, see below).

确实:

// the corrupted é - first line of the screenshot
echo bin2hex($char); // display 65cc81

// traditionnal é
echo bin2hex('é');   // display c3a9

经过大量搜索后，我在这里:似乎Mac OS将Unicode强调字符作为两个字符的组合来复制:在我们的示例中，为 e + ́ .到目前为止，我没有找到任何解决方案可以用真正的解决方案替换损坏的é，从而避免数据库中出现 e?.

After searching a lot, here I am : It seems that Mac OS copies Unicode accentuated chars as a combination of two chars: in our example, e + ́. So far, I didn't find any solution to replace corrupted é with the real one, to avoid e? in the database.

我有点绝望.

推荐答案

将表示标准化为一个的过程形式或其他形式被称为规范化.在PHP中，有一个 Normalizer 类，通过它发送所有输入是一个好主意:

The process of normalizing the representation to one form or the other is called, well, normalization. In PHP there's the Normalizer class for that, sending all input through it is a good idea:

$input = Normalizer::normalize($input);

您可能希望规范化为C，然后是规范分解，然后是规范组合.

You likely want to normalize to form C, Canonical Decomposition followed by Canonical Composition.

如果该类在您的系统上不可用，则有一个 Patchwork UTF-8库.

Should that class not be available on your system, there's the Patchwork UTF-8 library.

这篇关于PHP:强调Unicode的字符和变音符号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PHP:强调Unicode的字符和变音符号 [英] PHP: Unicode accentuated char and diacritics

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

PHP:强调Unicode的字符和变音符号 [英] PHP: Unicode accentuated char and diacritics

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭