根据PHP中的字形检查字符串的语言 [英] Check the language of string based on glyphs in PHP

查看:86
本文介绍了根据PHP中的字形检查字符串的语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含英语和阿拉伯语书籍标题的MySQL数据库,并且我使用的是

I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script.

我希望输出的HTML看起来像这样:

I'd like my output HTML to look something like this:

<h3>A book</h3>
<h3>كتاب <em>(kitaab)</em></h3>
<h3>Another book</h3>

PHP是否有一种方法可以根据其中使用的Unicode字符和字形来确定字符串的语言?我正在尝试得到这样的东西:

Is there a way for PHP to determine the language of a string based on the Unicode characters and glyphs used in it? I'm trying to get something like this:

$Ar = new Arabic('EnTransliteration');
while ($item = mysql_fetch_array($results)) {
    ...
    if (some test to see if $item['item_title'] has Arabic glyphs in it) {
      echo "<h3>$item[item_title] <em>(" . $Ar->ar2en($item['item_title']) . ")</em></h3>";
    } else {
      echo "<h3>$item[item_title]</h3>";
    }
    ...
}

幸运的是,该类在输入拉丁字符时不会阻塞,因此从理论上讲我可以通过转换发送每个结果,但这似乎浪费了处理时间.

Fortunately the class doesn't choke when fed Latin characters, so in theory I could send every result through the transformation, but that seems like a waste of processing.

谢谢!

我仍然没有找到检查字形或字符的方法.我想我可以将所有阿拉伯字符放在一个数组中,并检查数组中是否有任何匹配字符串的部分...

I still haven't found a way to check for glyphs or characters. I suppose I could put all the Arabic characters in an array and check if anything in the array matches a part of the string...

但是,我确实找到了一个临时解决方案,该解决方案最终可能会正常工作.不管语言如何,它都会使每个标题都经过转换,但仅在字符串更改时才输出带括号的音译:

I did, however, figure out an interim solution that might work fine in the end. It puts every title through the transformation regardless of language, but only outputs the parenthetical transliteration if the string was changed:

while ($item = mysql_fetch_array($mysql_results)) {
    $transliterate = trim(strtolower($Ar->ar2en($item['item_title'])));
    $item_title = (strtolower($item['item_title']) == $transliterate) ? $item['item_title'] : $item['item_title'] . " <em>($transliterate)</em>";

    echo "<h3>$item_title</h3>";
}

推荐答案

这应该做到:

preg_match("/\p{Arabic}/u", $item['item_title'])

如果愿意,可以使该正则表达式更加复杂,但我认为您并不需要.

You could make that regular expression a bit more sophisticated if you want to, but I don't think you really need to.

\p转义序列使您可以根据字符的Unicode属性选择字符(当 u模式修饰符).

The \p escape sequence lets you select characters based on their Unicode properties (when the u pattern modifier is used).

PHP手册中提到:"PCRE不支持扩展属性,例如希腊语"或"InMusicalSymbols".但这不再是完全正确的. PCRE 6.5版增加了对脚本名称的支持.

The PHP manual mentions: "Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE." But that's not entirely true anymore. PCRE release 6.5 added support for script names.

这篇关于根据PHP中的字形检查字符串的语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆