如何使用dplyr在R中的数据库上动态创建新变量/列? [英] How can I dynamically create new variables/columns on databases in R using dplyr?

查看:60
本文介绍了如何使用dplyr在R中的数据库上动态创建新变量/列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Stackoverflow的新手,也是R的新手.非常感谢您的帮助.

I am new to Stackoverflow and quite new to R. I would really appreciate your help.

我正在使用 dplyr mutate()函数根据一个初始列创建一组新列.对于要创建的先验数量已知的列,一切正常.

I am using dplyr's mutate() function to create a set new columns based on one initial column. For an a priori known number of columns to be created, everything works fine.

但是,在我的应用程序中,要创建的新列数是未知的(或者在运行代码之前确定为输入参数).

However, in my application, the number of new columns to be created is unknown (or rather determined as input parameter before running the code).

为说明起见,请考虑以下最小工作示例:

For illustration, consider the following minimal working example:

library(RSQLite)
library(dplyr)
library(dbplyr)
library(DBI)

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

copy_to(con, mtcars, "mtcars", temporary = FALSE)

db <- tbl(con, "mtcars") %>%
    select(carb) %>%
    distinct(carb) %>%
    arrange(carb) %>%
    mutate(carb1 = carb + 1) %>%
    mutate(carb2 = carb + 2) %>%
    mutate(carb3 = carb + 3) %>%
    show_query() %>%
    collect()

在此示例中,我创建了三个新变量.但是,我希望程序使用动态数量的变量(例如,五个或十个新变量).我还想在 collect()之前进行所有计算,因为我想尽可能晚地将数据复制到内存中.

In this example, I create three new variables. However, I want the program to work with a dynamic number of variables (e.g., five or ten new variables). I also would like to do all of the calculations before collect(), because I want to copy the data into memory as late as possible.

我的现实生活应用程序的某些背景:我想使用 DB2的函数ADD_MONTHS().因此,我需要 dplyr / dbplyr 将该函数直接刷新到SQL命令中.因此,我需要一个实际上不使用数据帧逻辑的解决方案-我需要将该解决方案放在 dplyr 中.

Some background for my real life application: I want to use the DB2's function ADD_MONTHS(). So I need dplyr/dbplyr to flush that function directly into an SQL command. I therefore need a solution that actually does not use data frame logic - I need the solution to be in dplyr.

从另一个角度来看:在SAS中,我将使用宏处理器来动态构建proc sql语句.R中有一个等价物吗?

From a different perspective: In SAS I'd use the macro processor to dynamically build a proc sql statement. Is there an equivalent in R?

推荐答案

我们可以使用 map

library(dplyr)
library(purrr)
library(stringr)
map_dfc(1:3, ~ df %>%
                  transmute(!! str_c('x', .x) := x + .x)) %>%
    bind_cols(df, .)
#  x x1 x2 x3
#1 1  2  3  4
#2 2  3  4  5
#3 3  4  5  6


对于数据库,在添加列之前先进行 collect

dat <- tbl(con, "mtcars") %>%
        select(carb) %>%
        distinct(carb) %>%
        arrange(carb) %>%
        collect()
map_dfc(dat$carb, ~ dat %>%
                      transmute(!! str_c('carb', .x) := carb + .x)) %>%
    bind_cols(dat, .)
# A tibble: 6 x 7
#   carb carb1 carb2 carb3 carb4 carb6 carb8
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     2     3     4     5     7     9
#2     2     3     4     5     6     8    10
#3     3     4     5     6     7     9    11
#4     4     5     6     7     8    10    12
#5     6     7     8     9    10    12    14
#6     8     9    10    11    12    14    16


或者,如果我们想在 collect 之前执行此操作,另一种选择是在 mutate


Or another option if we want to do this before collecting is to pass an expression in mutate

tbl(con, "mtcars") %>%
   select(carb) %>%
   distinct(carb) %>%
   arrange(carb) %>%
   mutate(!!! rlang::parse_exprs(str_c('carb', 1:3, sep="+", collapse=";"))) %>%
   rename_at(-1, ~ str_c('carb', 1:3)) %>%
   show_query() %>%
   collect()
#<SQL>
#SELECT `carb`, `carb` + 1.0 AS `carb1`, `carb` + 2.0 AS `carb2`, `carb` + 3.0 AS #`carb3`
#FROM (SELECT *
#FROM (SELECT DISTINCT *
#FROM (SELECT `carb`
#FROM `mtcars`))
#ORDER BY `carb`)
# A tibble: 6 x 4
#   carb carb1 carb2 carb3
#  <dbl> <dbl> <dbl> <dbl>
#1     1     2     3     4
#2     2     3     4     5
#3     3     4     5     6
#4     4     5     6     7
#5     6     7     8     9
#6     8     9    10    11

这篇关于如何使用dplyr在R中的数据库上动态创建新变量/列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆