非拉丁词的str_word_count()? [英] str_word_count() for non-latin words?

查看:123
本文介绍了非拉丁词的str_word_count()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图计算以非拉丁语言(保加利亚语)写的变量中的字数。但似乎str_word_count()不计数非拉丁语。 PHP文件的编码为UTF-8

  $ str =текстнакирилица 
echo'字数:'.str_word_count($ str);
//这返回0


解决方案

使用regex:

  $ str =текстнакирилица 
echo'字数:'.count(preg_split('/ \s + /',$ str));

这里我将字分隔符定义为空格字符。如果可能有其他的东西会被当作字分隔符,你需要将它添加到正则表达式中。



另外,请注意,由于没有utf字符在regex (不是字符串) - / u 修饰符不是必需的。但是如果你想要一些utf字符作为分隔符,你需要添加这个regex修饰符。



更新: p>

如果您只希望以字母对待西里尔字母,您可以使用:

  $ str =текст
на12453
кирилица;
echo'字数:'.count(preg_split('/ [^А-Яа-яЁё] + / u',$ str)


im trying to count the number of words in variable written in non-latin language (Bulgarian). But it seems that str_word_count() is not counting non-latin words. The encoding of the php file is UTF-8

$str = "текст на кирилица";
echo 'Number of words: '.str_word_count($str);
//this returns 0

解决方案

You may do it with regex:

$str = "текст на кирилица";
echo 'Number of words: '.count(preg_split('/\s+/', $str));

here I'm defining word delimiter as space characters. If there may be something else that will be treated as word delimiter, you'll need to add it into your regex.

Also, note, that since there's no utf characters in regex (not in string) - /u modifier isn't required. But if you'll want some utf characters to act as delimiter, you'll need to add this regex modifier.

Update:

If you want only cyrillic letters to be treated in words, you may use:

$str = "текст 
на 12453
кирилица";
echo 'Number of words: '.count(preg_split('/[^А-Яа-яЁё]+/u', $str));

这篇关于非拉丁词的str_word_count()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆