对不同的数据集进行相同的计算 [英] Same calculations over different datasets

查看:50
本文介绍了对不同的数据集进行相同的计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的初学者,正在尝试解决以下问题.我有30个数据集,我需要对其应用相同的计算.数据集包含名称,我必须找到每个数据集中所有列中包含的名称.所有数据集都有4列.为简单起见,假设我具有以下3个数据集:

I am a beginner in R and trying to solve the following problem. I have 30 datasets for which I need to apply the same calculations. The datasets contain names and I have to find the names that are included in all columns within each dataset. All datasets have 4 columns. For simplicity reasons, lets assume that I have the following 3 datasets:

df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"), 
x2=c("Ben","Paul","Tim", "Linda", "Alex", "MJ"), 
x3=c("Tomas","Alex","Ben", "Paul", "MJ", "Tim"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Ben"))

df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"), 
x2=c("Ben","Paul","Tim", "Linda", "Tyler", "MJ"), 
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))

df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"), 
x2=c("Lisa","Paul","Tim", "Linda", "Tyler", "MJ"), 
x3=c("Tyler","Alex","Ben", "Tyler", "MJ", "Lisa"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))

我的想法是,我首先提取每个数据集中的每个唯一名称(因为它们不同,有时在数据集中出现多次),然后查看这些唯一名称是否包含在每个数据集的每一列中.因此,我已经使用以下方法将所有数据集合并到一个数据集列表中:

My idea was that I first extract every unique name in each dataset (as they differ and sometimes occur several times in a dataset) and then look whether these unique names are included in every column of each dataset. Therefore, I already combined all datasets in a list of datasets using:

df_list<-list(df1,df2,df3)

然后我使用以下方法提取每个数据集中的唯一名称:

Then I extracted the unique names in each dataset using:

unique_list <- lapply(df_list,  function(x) {
  as.vector(unique(unlist(x)))
})

这是我卡住的地方.我不知道如何将唯一名称列表与每个数据集的每一列进行比较.我将分别对每个数据集进行处理的方式如下:

Here is where I get stuck. I do not know how to compare the list of unique names with each column of each dataset. The way I would do it for each dataset separately is as follows:

u<-as.vector(unique(unlist(df1)))
n<- ifelse(u%in%df1$x1 & u%in%df1$x2 & u%in%df1$x3 & 
               u%in%df1$x4", 1, 0)
Names_1<-cbind(u, n) #values with a 1 are the names included in all columns of dataset

有没有一种好的方法可以一次对所有数据集进行上述计算?

Is there any nice way to do the above calculation for all datasets at once?

非常感谢!

推荐答案

以这种方式尝试

library(tidyverse)
library(janitor)
df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Alex"), 
                 x3=c("Tomas","Alex","Ben", "Paul", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Tyler"), 
                 x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Tyler"), 
                 x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df <- bind_cols(df1, df2, df3) %>% clean_names()

uniq_name <- df %>% 
  pivot_longer(everything(), names_to = NULL) %>% 
  distinct() %>% 
  pull()

map(uniq_name, ~ colSums(df == .x) >= 1) %>% 
  map_lgl(all) %>% 
  as_tibble() %>% 
  add_column(uniq_name) %>% 
  filter(value)

# A tibble: 1 x 2
  value uniq_name
  <lgl> <chr>    
1 TRUE  Ben 

这篇关于对不同的数据集进行相同的计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆