str_word_count()函数不能正确显示阿拉伯语言 [英] str_word_count() function doesn't display Arabic language properly
问题描述
我让下一个函数从文本中返回特定数量的单词:
函数brief_text($ text ,$ num_words = 50){
$ words = str_word_count($ text,1);
$ required_words = array_slice($ words,0,$ num_words);
返回implode(,$ required_words);
}
它对英语有很好的效果,但是当我尝试将它与阿拉伯语语言失败,不会按预期返回单词。例如:
$ text_en =开罗是埃及的首都,巴黎是法国的首都;
echo brief_text($ text_en,10);
会输出 开罗是埃及的首都,巴黎是
,而
$ text_ar =القاهرةهىعاصمةمصروباريسهىعاصمةفرنسا;
echo brief_text($ text_ar,10);
会输出
我知道问题出在 str_word_count
函数上,但我不知道以解决它。
更新
用英语和阿拉伯语很好,但我在寻找解决方案,用于与阿拉伯语一起使用时,由 str_word_count()
函数引起的问题。无论如何,这里是我的另一个函数:
$ p code $函数brief_text($ string,$ number_of_required_words = 50){
$ string = trim(preg_replace('/ \s + /','',$ string));
$ words = explode(,$ string);
$ required_words = array_slice($ words,0,$ number_of_required_words); //从数组中获取特定数量的元素
返回implode(,$ required_words);
count:
//如果你喜欢
,你可以调用函数if(! function_exists('mb_str_word_count'))
{
函数mb_str_word_count($ string,$ format = 0,$ charlist ='[]'){
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
$ words = mb_split('[^ \x {0600} -\x {06FF}]',$ string);
switch($ format){
case 0:
return count($ words);
休息;
情况1:
情况2:
返回$字;
休息;
默认值:
返回$ words;
休息;
}
};
}
echo mb_str_word_count(القاهرةهىعاصمةمصروباريسهىعاصمةفرنسا)。 PHP_EOL;
资源
推荐
- 在HTML文件中使用
< meta charset =UTF-8/>
- 总是添加
Content-type:text / html; charset = utf-8
服务页面时的标题
I've made the next function to return a specific number of words from a text:
function brief_text($text, $num_words = 50) {
$words = str_word_count($text, 1);
$required_words = array_slice($words, 0, $num_words);
return implode(" ", $required_words);
}
and it works pretty well with English language but when I try to use it with Arabic language it fails and doesn't return words as expected. For example:
$text_en = "Cairo is the capital of Egypt and Paris is the capital of France";
echo brief_text($text_en, 10);
will output Cairo is the capital of Egypt and Paris is the
while
$text_ar = "القاهرة هى عاصمة مصر وباريس هى عاصمة فرنسا";
echo brief_text($text_ar, 10);
will output � � � � � � � � � �
.
I know that the problem is with the str_word_count
function but I don't know how to fix it.
UPDATE
I have already written another function that works pretty good with both English and Arabic languages, but I was looking for a solution for the problem caused by str_word_count()
function when using with Arabic. Anyway here is my other function:
function brief_text($string, $number_of_required_words = 50) {
$string = trim(preg_replace('/\s+/', ' ', $string));
$words = explode(" ", $string);
$required_words = array_slice($words, 0, $number_of_required_words); // get sepecific number of elements from the array
return implode(" ", $required_words);
}
Try with this function for word count:
// You can call the function as you like
if (!function_exists('mb_str_word_count'))
{
function mb_str_word_count($string, $format = 0, $charlist = '[]') {
mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
$words = mb_split('[^\x{0600}-\x{06FF}]', $string);
switch ($format) {
case 0:
return count($words);
break;
case 1:
case 2:
return $words;
break;
default:
return $words;
break;
}
};
}
echo mb_str_word_count("القاهرة هى عاصمة مصر وباريس هى عاصمة فرنسا") . PHP_EOL;
Resources
- Unicode list for arabic
- A Rule-Based Arabic Stemming Algorithm
- A Rule and Template Based Stemming Algorithm for Arabic Language (seems more complete)
Recommentations
- Use the tag
<meta charset="UTF-8"/>
in HTML files - Always add
Content-type: text/html; charset=utf-8
headers when serving pages
这篇关于str_word_count()函数不能正确显示阿拉伯语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!