在 R 中使用 str_count 计算整个单词/数字的出现次数 [英] Counting whole word/number occurrences with str_count in R

查看：59 发布时间：2021/7/6 20:40:14 r regex stringr

本文介绍了在 R 中使用 str_count 计算整个单词/数字的出现次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

类似于这个案例，我想用 stringr 包的 str_count 计算句子向量中出现的多个单词和数字的出现次数.

Similar to this case, i would like to count the number of occurrences of multiple words and numbers that occur in a vector of sentences with str_count of the stringr package.

但我注意到不仅计算整数，而且计算部分数字.例如:

But I noticed that not only whole numbers are counted but also partial numbers. For example:

df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km")
keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159")
library(stringr)
number_of_keywords_df <- str_count(df, paste(keywords, collapse='|'))

这里我收到一个 number_of_keywords_df 的向量，为 3, 3, 3 而显然，它应该是 3, 2, 2. str_count 函数似乎计算数字140000"中的部分字符串1400"和159"和159000".有什么办法可以防止吗?

Here I recieve a vector for number_of_keywords_df of 3, 3, 3 while clearly, it should be 3, 2, 2. The str_count function seems to count the partial strings "1400" and "159" within the numbers "140000" and "159000". Is there any way of preventing that?

推荐答案

使用 sprintf 可以添加单词边界:

Using sprintf you can add word boundaries:

number_of_keywords_df <- str_count(df, paste(sprintf("\\b%s\\b", keywords), collapse = '|'))
number_of_keywords_df

哪个收益

[1] 3 2 2

这篇关于在 R 中使用 str_count 计算整个单词/数字的出现次数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 R 中使用 str_count 计算整个单词/数字的出现次数 [英] Counting whole word/number occurrences with str_count in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 R 中使用 str_count 计算整个单词/数字的出现次数 [英] Counting whole word/number occurrences with str_count in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭