在 R 中的列中搜索多个值 [英] Search for multiple values in a column in R

查看:28
本文介绍了在 R 中的列中搜索多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据框:

I have a data frame with two columns:

df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F)

我需要整个句子"列中每一行的所有动物"出现次数的总和.

I'd need the sum of the occurrences of all the "animals" on each row in the entire "sentences" column.

例如:动物"第一行 c("cat; dog;bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) + (bird = 3) = 6 .

For example: "animals" first row c("cat; dog; bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) + (bird = 3) = 6 .

结果将是这样的第三列:

The result will be a third column like this:

df <- cbind( sum_accurrences_sentences_column = c("6", "5", "3"), df)

我尝试了以下代码,但它们不起作用.

I have tried the following codes but they do not work.

df[str_split(df$animals, ";") %in% df$sentences, ]

str_count(df$sentences, str_split(df$animals, ";"))

任何帮助将不胜感激:)

Any help would be appreciated :)

推荐答案

这是一个基本的 R 解决方案:

Here's a base R solution:

首先用gsub删除所有的;,然后拆分句子列并unlist把它变成一个向量:

First remove all the ; with gsub, then split the sentences column and unlist it into a vector:

split_sentence_column = unlist(strsplit(gsub(';','',df$sentences),' '))

然后设置一个for循环,每行得到一个动物的向量,用%in%检查句子列中的哪些动物在动物列表中,然后将所有的TRUE 案例.然后我们可以直接将其分配给新的 df 列:

Then set up a for loop and for each row get a vector of the animals, check which of the sentence column animals are in the animal list with %in%, then sum all the TRUE cases. We can then assign this to a new df column directly:

for(i in 1:nrow(df)){
  animals = unlist(strsplit(df$animals[i], '; '))
  df$sum_occurrences_sentences_column[i] = sum(split_sentence_column %in% animals)
}

> df
         animals                                                        sentences sum_occurrences_sentences_column
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue                                6
2      dog; bird                    the dog is black; the bird is yellow and blue                                5
3           bird                                                 the bird is blue                                3

这篇关于在 R 中的列中搜索多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆