正则表达式,使用JavaScript进行准确的字数统计 [英] Regular Expression for accurate word-count using JavaScript

查看:385
本文介绍了正则表达式,使用JavaScript进行准确的字数统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为JavaScript命令组合一个正则表达式,以准确计算textarea中的单词数。

I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.

我找到的一个解决方案是如下:

One solution I had found is as follows:

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;

但这不包括任何非拉丁字符(例如:西里尔字母,韩文字母等);它完全跳过它们。

But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.

我放在一起的另一个:

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;

但除非文档以空格字符结尾,否则这不会准确计算。如果空格字符附加到计数值,即使空文档也计算1个单词。此外,如果文档以空格字符开头,则会计算一个无关的单词。

But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.

我是否可以在此命令中使用正则表达式来准确计算单词,无论输入法?

Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?

推荐答案

这应该做你想要的事情:

This should do what you're after:

value.match(/\S+/g).length;

不是拆分字符串,而是匹配任何非空白字符序列。

Rather than splitting the string, you're matching on any sequence of non-whitespace characters.

如果需要,还可以轻松提取每个单词;)

There's the added bonus of being easily able to extract each word if needed ;)

这篇关于正则表达式,使用JavaScript进行准确的字数统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆