将str_word_count用于UTF8文本 [英] Using str_word_count for UTF8 texts

查看：82 发布时间：2020/7/2 22:49:32 php utf-8 text-manipulation

本文介绍了将str_word_count用于UTF8文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这段文字:

$text  = "Başka, küskün otomobil kaçtı buraya küskün otomobil neden kaçtı
          kaçtı buraya, oraya KISMEN @here #there J.J.Johanson hep.
          Danny:Where is mom? I don't know! Café est weiß for 2 €uros.
          My 2nd nickname is mike18.";

最近我正在使用它.

$a1= array_count_values(str_word_count($text, 1, 'ÇçÖöŞşİIıĞğÜü@#éß€1234567890'));
arsort($a1);

您可以使用此小提琴进行检查:
http://ideone.com/oVUGYa

You can check with this fiddle:
http://ideone.com/oVUGYa

但是此解决方案不能解决所有UTF8问题.我无法将整个UTF8集写入str_word_count作为参数.

But this solution doesn't solve all UTF8 problems. I can't write whole UTF8 set into str_word_count as parameter.

所以我创建了这个:

$wordsArray = explode(" ",$text);
foreach ($wordsArray as $k => $w) {
    $wordsArray[$k] = str_replace(array(",","."),"",$w);
}
$wordsArray2 = array_count_values($wordsArray);
arsort($wordsArray2);

输出应如下所示:

Array (
 [kaçtı] => 3
 [küskün] => 2
 [buraya] => 2
 [@here] => 1
 [#there] => 1
 [Danny] => 1
 [mom] => 1
 [don't] => 1
 [know] => 1
 ...
 ...
)

这很好用，但不能涵盖所有句子单词问题.例如，我用str_replace删除了逗号和点.

This works well but it doesn't cover all sentence-word problems. For example I removed comma and dots with str_replace.

例如，此解决方案不包含以下单词:Hello Mike,how are you ? Mike以及如何不被视为不同的单词.

For example this solution doesn't cover the words like this: Hello Mike,how are you ? Mike and how won't be treated as different words.

str_word_count解决方案:KISMEN @here #there中未涉及.在和破折号处不会被考虑.

This doesn't covered in str_word_count solution: KISMEN @here #there. At and dash sign and won't be taken into consideration.

这将不包括在J.J.Johanson中.虽然是一个字，但将被视为JJJohanson

This will not be covered J.J.Johanson. Although it is a word, it will be treated as JJJohanson

问题，应该从单词中删除感叹号.

Question, exclamation signs should be removed from words.

是否有更好的方法通过UTF8支持获得str_word_count行为?问题顶部的$text对我来说是参考.

Is there a better way to get str_word_count behaviour with UTF8 support ? The $text which exists in the top of this question is reference for me.

(如果您可以用小提琴来回答问题会更好)

(It would be better if you can provide a fiddle with your answer)

将str_word_count用于UTF8文本 [英] Using str_word_count for UTF8 texts

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

将str_word_count用于UTF8文本 [英] Using str_word_count for UTF8 texts

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭