在R中,使用非标准评估从data.frames中选择特定变量 [英] In R, use nonstandard evaluation to select specific variables from data.frames

查看:61
本文介绍了在R中,使用非标准评估从data.frames中选择特定变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经建立了多个大型data.frames,就像关系数据库一样,我想做一个函数来查找所需的任何变量,然后从该特定data.frame中抓取并添加它到我当前正在处理的data.frame上.我有办法做到这一点,但它需要临时列出所有data.frames的列表,这似乎效率很低.我怀疑非标准评估会为我解决这个问题,但是我不确定该怎么做.

I've got several large-ish data.frames set up like a relational database, and I'd like to make a single function to look for whatever variable I need and grab it from that particular data.frame and add it to the data.frame I'm currently working on. I've got a way to do this that works, but it requires temporarily making a list of all the data.frames, which seems inefficient. I suspect that nonstandard evaluation would solve this problem for me, but I'm not sure how to do it.

这是可行的,但似乎效率低下:

Here's what works but seems inefficient:

Table1 <- data.frame(ID = LETTERS[1:10], ColA = rnorm(10), ColB = rnorm(10),
                     ColC = rnorm(10))

Table2 <- data.frame(ID = LETTERS[1:10], ColD = rnorm(10), ColE = rnorm(10),
                     ColF = rnorm(10))

Table3 <- data.frame(ID = LETTERS[1:10], ColG = rnorm(10), ColH = rnorm(10),
                     ColI = rnorm(10))

Key <- data.frame(Table = rep(c("Table1", "Table2", "Table3"), each = 4),
                  ColumnName = c("ID", paste0("Col", LETTERS[1:3]),
                                 "ID", paste0("Col", LETTERS[4:6]),
                                 "ID", paste0("Col", LETTERS[7:9])))

# function for grabbing info from other tables
grab <- function(StartDF, ColNames){

      AllDFs <- list(Table1, Table2, Table3)
      names(AllDFs) <- c("Table1", "Table2", "Table3")

      # Determine which data.frames have that column
      WhichDF <- Key %>% filter(ColumnName %in% ColNames) %>% 
            select(Table)

      TempDF <- StartDF

      for(i in 1:length(ColNames)){
            ToAdd <- AllDFs[WhichDF[i, 1]]
            ToAdd <- ToAdd[[1]] %>% 
                  select(c(ColNames[i], ID))

            TempDF <- TempDF %>% left_join(ToAdd)
            rm(ToAdd)
      }

      return(TempDF)


}

grab(Table1, c("ColE", "ColH"))

相反,很棒的事情是这样的:

What would be great instead would be something like this:

grab <- function(StartDF, ColNames){

      # Some function that returns the column names of all the data.frames
      # without me creating a new object that is a list of them

      # Some function that left_joins the correct data.frame plus the column
      # "ID" to my starting data.frame, again without needing to create that list 
      # of all the data.frames

}

推荐答案

我们可以直接获取从键"的表"列返回的对象的值,而无需手动创建列表'具有 mget

Instead of creating the list manually, we can directly get the values of the objects returned from the 'Table' column of 'Key' dataset with mget

library(dplyr)
library(purrr)
grab <- function(StartDF, ColNames){



     # filter the rows of Key based on the ColNames input
     # pull the Table column as a vector
     # column was factor, so convert to character class
     # return the value of the objects with mget in a list
     Tables <- Key %>% 
               filter(ColumnName %in% ColNames) %>% 
               pull(Table) %>%
               as.character %>%
               mget(envir = .GlobalEnv) 


      TempDF <- StartDF

      # use the same left_joins in a loop after selecting only the
      # ID and corresponding columns from 'ColNames'
      for(i in seq_along(ColNames)){
            ToAdd  <- Tables[[i]] %>%
                         select(ColNames[i], ID)          

            TempDF <- TempDF %>% 
                  left_join(ToAdd)
            rm(ToAdd)
      }

      TempDF


}

grab(Table1, c("ColE", "ColH"))


或者另一个选择是 reduce

grab <- function(StartDF, ColNames) {
     #only change is that instead of a for loop
     # use reduce with left_join after selecting the corresponding columns
     # with map
     Key %>%
       filter(ColumnName %in% ColNames) %>% 
       pull(Table) %>%
       as.character %>%
       mget(envir = .GlobalEnv)  %>%
       map2(ColNames, ~ .x %>%
                     select(ID, .y)) %>%
       append(list(Table1), .)  %>%
       reduce(left_join)

   }

grab(Table1, c("ColE", "ColH"))
#   ID       ColA       ColB        ColC        ColE        ColH
#1   A -0.9490093  0.5177143 -1.91015491  0.07777086  1.86277670
#2   B -0.7182786 -1.1019146 -0.70802738 -0.73965230  0.18375660
#3   C  0.5064516 -1.6904354  1.11106206  2.04315508 -0.65365228
#4   D  0.9362477  0.5260682 -0.03419651 -0.51628310 -1.17104181
#5   E  0.5636047 -0.9470895  0.43303304 -2.95928629  1.86425049
#6   F  1.0598531  0.4144901  0.10239896  1.57681703 -0.05382603
#7   G  1.1335047 -0.8282173 -0.28327898  2.02917831  0.50768462
#8   H  0.2941341  0.3261185 -0.15528127 -0.46470035 -0.86561320
#9   I -2.1434905  0.6567689  0.02298549  0.90822132  0.64360337
#10  J  0.4291258  1.3410147  0.67544567  0.12466251  0.75989623

这篇关于在R中,使用非标准评估从data.frames中选择特定变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆