计算两个字符串中的常用单词 [英] Count common words in two strings

查看：61 发布时间：2020/10/15 21:21:51 r string text-mining data-analysis

本文介绍了计算两个字符串中的常用单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个字符串：

a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"

我希望了解一些常用字词在这些字符串之间。

I am looking to get a count of common words between these strings.

答案应该是3。

罗伊

"Roy"

旅行

是常用词

这是我尝试过的：

stra <- as.data.frame(t(read.table(textConnection(a), sep = " ")))
strb <- as.data.frame(t(read.table(textConnection(b), sep = " ")))

拍摄唯一以避免重复计数

Taking unique to avoid repeat counting

stra_unique <-as.data.frame(unique(stra$V1))
strb_unique <- as.data.frame(unique(strb$V1))
colnames(stra_unique) <- c("V1")
colnames(strb_unique) <- c("V1")

common_words <-length(merge(stra_unique,strb_unique, by = "V1")$V1)

对于包含2000和1200个字符串的数据集，我需要这样做。
我必须计算字符串的总时间为2000 X1200。任何快速方法，都无需使用循环。

I need to this for a data set with over 2000 and 1200 strings. Total times I have to evaluate the string is 2000 X 1200. Any quick way, without using loops.

推荐答案

您可以使用 strsplit 和 相交 从库库中：

> a <- "Roy lives in Japan and travels to Africa"
> b <- "Roy travels Africa with this wife"
> a_split <- unlist(strsplit(a, sep=" "))
> b_split <- unlist(strsplit(b, sep=" "))
> length(intersect(a_split, b_split))
[1] 3

这篇关于计算两个字符串中的常用单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算两个字符串中的常用单词 [英] Count common words in two strings

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算两个字符串中的常用单词 [英] Count common words in two strings

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭