将任何可转换的utf8字符音译为ascii等效项 [英] Transliterate any convertible utf8 char into ascii equivalent

查看：125 发布时间：2020/5/27 2:33:15 php utf-8 ascii iconv transliteration

本文介绍了将任何可转换的utf8字符音译为ascii等效项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有好的解决方案可以很好地完成音译工作?

Is there any good solution out there that does this transliteration in a good manner?

我尝试使用iconv()，但是很烦人，它的行为并不像人们期望的那样.

I've tried using iconv(), but is very annoying and it does not behave as one might expect.

使用//TRANSLIT将尝试替换它可以替换的内容，并使所有内容都不能转换为?".
使用//IGNORE不会留下?"在文本中，但也不会音译，并且在发现不可转换的char时也会升起E_NOTICE，因此您必须将iconv与@错误抑制器一起使用
使用//IGNORE//TRANSLIT(正如某些人在PHP论坛中所建议的)实际上与//IGNORE相同(我自己在PHP版本5.3.2和5.3.13上进行了尝试)
也使用//TRANSLIT//IGNORE与//TRANSLIT

Using //TRANSLIT will try to replace what it can, leaving everything nonconvertible as "?"
Using //IGNORE will not leave "?" in text, but will also not transliterate and will also raise E_NOTICE when nonconvertible char is found, so you have to use iconv with @ error suppressor
Using //IGNORE//TRANSLIT (as some people suggested in PHP forum) is actually same as //IGNORE (tried it myself on php versions 5.3.2 and 5.3.13)
Also using //TRANSLIT//IGNORE is same as //TRANSLIT

它也使用当前的语言环境设置进行音译.

It also uses current locale settings to transliterate.

警告-大量文本和代码在后面！

以下是一些示例:

$text = 'Regular ascii text + čćžšđ + äöüß + éĕěėëȩ + æø€ + $ + ¶ + @';
echo '<br />original: ' . $text;
echo '<br />regular: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> regular: Regular ascii text + ????? + ???ss + ?????? + ae?EUR + $ + ? + @

setlocale(LC_ALL, 'en_GB');
echo '<br />en_GB: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

setlocale(LC_ALL, 'en_GB.UTF8'); // will this work?
echo '<br />en_GB.UTF8: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB.UTF8: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

好，那确实转换了čćäööüßéěėė和æ，但为什么不not和ø?

Ok, that did convert č ć š ä ö ü ß é ĕ ě ė ë ȩ and æ, but why not đ and ø?

// now specific locales
setlocale(LC_ALL, 'hr_Hr'); // this should fix croatian đ, right?
echo '<br />hr_Hr: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// wrong > hr_Hr: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

setlocale(LC_ALL, 'sv_SE'); // so this will fix swedish ø?
echo '<br />sv_SE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// will not > sv_SE: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

//this is interesting
setlocale(LC_ALL, 'de_DE');
echo '<br />de_DE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> de_DE: Regular ascii text + cczs? + aeoeuess + eeeeee + ae?EUR + $ + ? + @
// actually this is what any german would expect since ä ö ü really is same as ae oe ue

让我们尝试使用//IGNORE:

echo '<br />ignore: ' . iconv("UTF-8", "ASCII//IGNORE", $text);
//> ignore: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 49"

// with translit?
echo '<br />ignore/translit: ' . iconv("UTF-8", "ASCII//IGNORE//TRANSLIT", $text);
//same as ignore only> ignore/translit: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 54"

// translit/ignore?
echo '<br />translit/ignore: ' . iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $text);
//same as translit only> translit/ignore: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

使用此人的解决方案也无法按需工作:Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + � + @

Using solution of this guy also does not work as wanted: Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + � + @

即使使用PECL intl Normalizer 类(也无法唤醒)即使您的PHP> 5.3.0，也总是如此，因为ICU包intl使用可能不适用于PHP(即在某些托管服务器上)会产生错误的结果:

Even using PECL intl Normalizer class (which is not awailable always even if you have PHP > 5.3.0, since ICU package intl uses may not be available to PHP i.e. on certain hosting servers) produces wrong result:

echo '<br />normalize: ' .preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD));
//>normalize: Regular ascii text + cczsđ + aouß + eeeeee + æø€ + $ + ¶ + @

那么还有其他方法可以做到这一点，或者要做的唯一正确的事情就是自己做preg_replace()或str_replace()并定义音译表?

So is there any other way of doing this right or the only proper thing to do is to do preg_replace() or str_replace() and define transliteration tables yourself?

//附录: 我在2008年的ZF Wiki辩论中发现有关有关Zend_Filter_Transliterate的建议但由于某些语言无法进行转换(即中文)，因此项目被放弃了，但是对于任何基于拉丁文和西里尔文的IMO，该选项仍应存在.

// appendix: I have found on ZF wiki debate from 2008 about proposal for Zend_Filter_Transliterate but project was dropped since in some languages it is not possible to convert (i.e. chinese), but still for any latin- and cyrilic-based language IMO this option should exist.

将任何可转换的utf8字符音译为ascii等效项 [英] Transliterate any convertible utf8 char into ascii equivalent

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

将任何可转换的utf8字符音译为ascii等效项 [英] Transliterate any convertible utf8 char into ascii equivalent

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭