如何做JavaScript的英语和中文混合的字数 [英] How to do word counts for a mixture of English and Chinese in Javascript
问题描述
我想计算一个包含英语和中文的段落中的字数。对于英语,很简单。每个词是一个词。对于中文,我们将每个字符计为一个字。因此,香港人在这里是三个字。
I want to count the number of words in a passage that contains both English and Chinese. For English, it's simple. Each word is a word. For Chinese, we count each character as a word. Therefore, 香港人 is three words here.
例如,我是香港人的字数应为6。
So for example, "I am a 香港人" should have a word count of 6.
谢谢!
推荐答案
尝试这样的正则表达式:
Try a regex like this:
/[\u00ff-\uffff]|\S+/g
例如,我是香港人.match \\ u00ff-\\\] | \S + / g)
给出:
["I", "am", "a", "香", "港", "人"]
然后你可以检查结果数组的长度。
Then you can just check the length of the resulting array.
\\\ÿ -\\\
的正则表达式是一个unicode字符范围;你可能想把这个范围缩小到你想要算作单词的字符。例如,CJK Unified将为 \\\一-\\\鿌
。
The \u00ff-\uffff
part of the regex is a unicode character range; you probably want to narrow this down to just the characters you want to count as words. For example, CJK Unified would be \u4e00-\u9fcc
.
function countWords(str) {
var matches = str.match(/[\u00ff-\uffff]|\S+/g);
return matches ? matches.length : 0;
}
这篇关于如何做JavaScript的英语和中文混合的字数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!