str_word_count()函数不能正确显示阿拉伯语言 [英] str_word_count() function doesn't display Arabic language properly

查看:102
本文介绍了str_word_count()函数不能正确显示阿拉伯语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我让下一个函数从文本中返回特定数量的单词:

 函数brief_text($ text ,$ num_words = 50){
$ words = str_word_count($ text,1);
$ required_words = array_slice($ words,0,$ num_words);
返回implode(,$ required_words);
}

它对英语有很好的效果,但是当我尝试将它与阿拉伯语语言失败,不会按预期返回单词。例如:

  $ text_en =开罗是埃及的首都,巴黎是法国的首都; 
echo brief_text($ text_en,10);

会输出 开罗是埃及的首都,巴黎是 ,而

  $ text_ar =القاهرةهىعاصمةمصروباريسهىعاصمةفرنسا; 
echo brief_text($ text_ar,10);

会输出

我知道问题出在 str_word_count 函数上,但我不知道以解决它。



更新



用英语和阿拉伯语很好,但我在寻找解决方案,用于与阿拉伯语一起使用时,由 str_word_count()函数引起的问题。无论如何,这里是我的另一个函数:

$ p code $函数brief_text($ string,$ number_of_required_words = 50){
$ string = trim(preg_replace('/ \s + /','',$ string));
$ words = explode(,$ string);
$ required_words = array_slice($ words,0,$ number_of_required_words); //从数组中获取特定数量的元素
返回implode(,$ required_words);


解决方案

count:

//如果你喜欢
,你可以调用函数if(! function_exists('mb_str_word_count'))
{
函数mb_str_word_count($ string,$ format = 0,$ charlist ='[]'){
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

$ words = mb_split('[^ \x {0600} -\x {06FF}]',$ string);
switch($ format){
case 0:
return count($ words);
休息;
情况1:
情况2:
返回$字;
休息;
默认值:
返回$ words;
休息;
}
};
}



echo mb_str_word_count(القاهرةهىعاصمةمصروباريسهىعاصمةفرنسا)。 PHP_EOL;



资源





推荐




  • 在HTML文件中使用< meta charset =UTF-8/>

  • 总是添加 Content-type:text / html; charset = utf-8 服务页面时的标题


I've made the next function to return a specific number of words from a text:

function brief_text($text, $num_words = 50) {
    $words = str_word_count($text, 1);
    $required_words = array_slice($words, 0, $num_words);
    return implode(" ", $required_words);
}

and it works pretty well with English language but when I try to use it with Arabic language it fails and doesn't return words as expected. For example:

$text_en = "Cairo is the capital of Egypt and Paris is the capital of France";
echo brief_text($text_en, 10);

will output Cairo is the capital of Egypt and Paris is the while

$text_ar = "القاهرة هى عاصمة مصر وباريس هى عاصمة فرنسا";
echo brief_text($text_ar, 10); 

will output � � � � � � � � � �.

I know that the problem is with the str_word_count function but I don't know how to fix it.

UPDATE

I have already written another function that works pretty good with both English and Arabic languages, but I was looking for a solution for the problem caused by str_word_count() function when using with Arabic. Anyway here is my other function:

    function brief_text($string, $number_of_required_words = 50) {
        $string = trim(preg_replace('/\s+/', ' ', $string));
        $words = explode(" ", $string);
        $required_words = array_slice($words, 0, $number_of_required_words); // get sepecific number of elements from the array
        return implode(" ", $required_words);
    }

解决方案

Try with this function for word count:

// You can call the function as you like
if (!function_exists('mb_str_word_count'))
{
    function mb_str_word_count($string, $format = 0, $charlist = '[]') {
        mb_internal_encoding( 'UTF-8');
        mb_regex_encoding( 'UTF-8');

        $words = mb_split('[^\x{0600}-\x{06FF}]', $string);
        switch ($format) {
            case 0:
                return count($words);
                break;
            case 1:
            case 2:
                return $words;
                break;
            default:
                return $words;
                break;
        }
    };
}



echo mb_str_word_count("القاهرة هى عاصمة مصر وباريس هى عاصمة فرنسا") . PHP_EOL;

Resources

Recommentations

  • Use the tag <meta charset="UTF-8"/> in HTML files
  • Always add Content-type: text/html; charset=utf-8 headers when serving pages

这篇关于str_word_count()函数不能正确显示阿拉伯语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆