如何“删除变音符号"来自 PHP 中的 UTF8 字符? [英] How to "remove diacritics" from UTF8 characters in PHP?
问题描述
我需要在 PHP 中复制 MySQL utf8_general_ci
排序规则的行为.严格来说,我需要检测哪些是不同的,哪些是相同的.独立于案例的部分很容易.问题是 utf_general_ci
认为带有变音符号的字符和没有变音符号的字符是相等的:e = è = é 等等.要复制这种比较,我需要有一种方法来替换 è -> e, é -> e.
I need to replicate the behavior of MySQL utf8_general_ci
collation in PHP. Strictly speaking I need to detect what whould be considered different and what would be considered the same. The case independent part is easy. The problem is utf_general_ci
considers characters with diacritics and characters without diacritics to be equal: e = è = é etc.. To replicate that comparison, I'd need to have a way to replace è -> e, é -> e.
我想到的方法是:
echo iconv("utf-8", "ascii//TRANSLIT", "é");
一个问题是 iconv
的行为取决于当前的语言环境,这就是问题所在.
One problem is iconv
behaves differently depending on current locale and that's asking for a problem.
另一个问题是输入也可能包含不应被剥离或导致 PHP 通知的西里尔字母.
The other problem is the input may also contain Cirillic letters that shouldn't be stripped or result in a PHP Notice.
echo iconv("utf-8", "ascii//TRANSLIT", "дом");
是否有解决方案,或者我是否必须手动创建每个带有变音符号的字符到没有它的字符的映射?
Is there a solution or do I have to create manually mapping of each character with diacritic to a one without it?
推荐答案
intl 的 Transliterator 将让您定义更深入的音译规则.可以在 icu-project.org 上找到有关音译规则的完整文档.
intl's Transliterator will let you define far more in-depth transliteration rules. The full documentation on transliteration rules can be found on icu-project.org.
$tests = [ "é", "дом" ];
$tl = Transliterator::create('Latin-ASCII;');
foreach($tests as $str) {
var_dump(
$tl->transliterate($str)
);
}
输出:
string(1) "e"
string(6) "дом"
这篇关于如何“删除变音符号"来自 PHP 中的 UTF8 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!