PHP中根据字形检查字符串的语言 [英] Check the language of string based on glyphs in PHP

查看:20
本文介绍了PHP中根据字形检查字符串的语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 MySQL 数据库,书名有英文和阿拉伯文,我使用的是 PHP 类 可以自动将阿拉伯文本音译成拉丁文字.

I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script.

我希望我的输出 HTML 看起来像这样:

I'd like my output HTML to look something like this:

<h3>A book</h3>
<h3>كتاب <em>(kitaab)</em></h3>
<h3>Another book</h3>

PHP 有没有办法根据字符串中使用的 Unicode 字符和字形来确定字符串的语言?我想得到这样的东西:

Is there a way for PHP to determine the language of a string based on the Unicode characters and glyphs used in it? I'm trying to get something like this:

$Ar = new Arabic('EnTransliteration');
while ($item = mysql_fetch_array($results)) {
    ...
    if (some test to see if $item['item_title'] has Arabic glyphs in it) {
      echo "<h3>$item[item_title] <em>(" . $Ar->ar2en($item['item_title']) . ")</em></h3>";
    } else {
      echo "<h3>$item[item_title]</h3>";
    }
    ...
}

幸运的是,当输入拉丁字符时,班级不会窒息,所以理论上我可以通过转换发送每个结果,但这似乎是在浪费处理.

Fortunately the class doesn't choke when fed Latin characters, so in theory I could send every result through the transformation, but that seems like a waste of processing.

谢谢!

我仍然没有找到检查字形或字符的方法.我想我可以将所有阿拉伯字符放在一个数组中,然后检查数组中是否有任何内容与字符串的一部分匹配...

I still haven't found a way to check for glyphs or characters. I suppose I could put all the Arabic characters in an array and check if anything in the array matches a part of the string...

不过,我确实想出了一个最终可能会正常工作的临时解决方案.无论语言如何,它都会对每个标题进行转换,但如果字符串发生更改,则仅输出括号中的音译:

I did, however, figure out an interim solution that might work fine in the end. It puts every title through the transformation regardless of language, but only outputs the parenthetical transliteration if the string was changed:

while ($item = mysql_fetch_array($mysql_results)) {
    $transliterate = trim(strtolower($Ar->ar2en($item['item_title'])));
    $item_title = (strtolower($item['item_title']) == $transliterate) ? $item['item_title'] : $item['item_title'] . " <em>($transliterate)</em>";

    echo "<h3>$item_title</h3>";
}

推荐答案

应该这样做:

preg_match("/\p{Arabic}/u", $item['item_title'])

如果你愿意,你可以让正则表达式更复杂一点,但我认为你真的不需要.

You could make that regular expression a bit more sophisticated if you want to, but I don't think you really need to.

\p转义序列 允许您根据 Unicode 属性选择字符(当 u 模式修饰符).

The \p escape sequence lets you select characters based on their Unicode properties (when the u pattern modifier is used).

PHP 手册中提到:PCRE 不支持扩展属性,例如Greek"或InMusicalSymbols"."但这不再完全正确.PCRE 6.5 版添加了对脚本名称的支持.

The PHP manual mentions: "Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE." But that's not entirely true anymore. PCRE release 6.5 added support for script names.

这篇关于PHP中根据字形检查字符串的语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆