如何做JavaScript的英语和中文混合的字数 [英] How to do word counts for a mixture of English and Chinese in Javascript

查看:166
本文介绍了如何做JavaScript的英语和中文混合的字数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算一个包含英语和中文的段落中的字数。对于英语,很简单。每个词是一个词。对于中文,我们将每个字符计为一个字。因此,香港人在这里是三个字。

I want to count the number of words in a passage that contains both English and Chinese. For English, it's simple. Each word is a word. For Chinese, we count each character as a word. Therefore, 香港人 is three words here.

例如,我是香港人的字数应为6。

So for example, "I am a 香港人" should have a word count of 6.

谢谢!

推荐答案

尝试这样的正则表达式:

Try a regex like this:

/[\u00ff-\uffff]|\S+/g

例如,我是香港人.match \\ u00ff-\\\￿] | \S + / g)给出:

["I", "am", "a", "香", "港", "人"]

然后你可以检查结果数组的长度。

Then you can just check the length of the resulting array.

\\\ÿ -\\\￿ 的正则表达式是一个unicode字符范围;你可能想把这个范围缩小到你想要算作单词的字符。例如,CJK Unified将为 \\\一-\\\鿌

The \u00ff-\uffff part of the regex is a unicode character range; you probably want to narrow this down to just the characters you want to count as words. For example, CJK Unified would be \u4e00-\u9fcc.

function countWords(str) {
    var matches = str.match(/[\u00ff-\uffff]|\S+/g);
    return matches ? matches.length : 0;
}

这篇关于如何做JavaScript的英语和中文混合的字数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆