使用R中的str_count函数计算多个模式 [英] Count multiple patterns using the str_count function in R

查看:577
本文介绍了使用R中的str_count函数计算多个模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于R来说还很陌生,并且使用 string_count 函数来检测多个未知且包含在单独向量中的单词,这让它有点挣扎。



现在,我知道如何使用以下代码检测模式的单个实例:

  str_count(mydf $ string, Apples)

我要做的就是检测多个单词(例如,从本身是从另一个数据帧创建的向量(例如,通过使用 Uniques< -unique(mydf1 $ words))。



这里的关键是 mydf1 $ words 中出现的单词完全取决于所使用的数据



答案可能很简单,但就我的生活而言,我似乎无法解决吧!

解决方案

您是说在stringr软件包中的 str_count 函数吗?



如果是这样,它将使用正则表达式,并且在正则表达式的模式中, | 字符表示或,因此 str_count(mydf $ string,'apple | pear')将计算 apple或 pear的出现总数。可以使用 paste 构造带有 | 字符的字符串,请尝试:

  str_count(mydf $ string,paste(Uniques,crash ='|'))

只需运行该部分代码,即可看到由 paste 构造的字符串。请注意,如果您构建具有很多选项的模式,则其运行速度可能会非常慢。另一种选择是将第一个字符串拆分为单个单词,并使用%in%运算符将单词的向量与选项的向量进行比较(然后计算TRUE)。 / p>

Fairly new to R and struggling a bit with using the string_count function to detect multiple words that are unknown and are contained within a separate vector.

Now, I know how to detect a single instance of a pattern using the following code:

str_count(mydf$string, "Apples")

What I want to do is detect multiple words (e.g. "Apples", "Pears", "Oranges" etc) from a vector that is in itself created from another data frame (e.g. by using Uniques<-unique(mydf1$words)).

The key thing here is that the words that appear in mydf1$words are entirely dependent on what data has been uploaded to R in the first place, as this will change from data set to data set.

The answer is probably pretty straight forward but for the life of me I cant seem to work it out!

解决方案

Do you mean the str_count function in the stringr package?

If so, it uses regular expressions and in the pattern for regular expressions the | character means "or", so str_count(mydf$string, 'apple|pear') will count the occurrences of "apple" or "pear" to give a total count. The string with the | characters can be constructed with paste, try:

str_count(mydf$string, paste(Uniques, collapse='|'))

You can see the string that is constructed by paste by just running that part of the code. Note that if you construct a pattern with a lot of options then it may run very slowly. Another option would be to split the 1st string into individual words and compare the vector of words with the vector of options using the %in% operator (then count the TRUE's).

这篇关于使用R中的str_count函数计算多个模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆