dplyr-使用正则表达式的多个列的总和 [英] dplyr - sum of multiple columns using regular expressions

查看:109
本文介绍了dplyr-使用正则表达式的多个列的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于数据集mtcars2

For the dataset mtcars2

mtcars2 = mtcars
mtcars2 = mtcars2 %>% mutate(cyl9=cyl, disp9=disp, gear2=gear)

我想获得一个新列,该列为多行,通过使用正则表达式捕获模式。

I want to get a new column which is the sum of multiple columns, by using regular expressions to capture the pattern.

这是一个解决方案,但这是通过硬编码完成的。

This is a solution, however this is done by hard-coding

select(mtcars2, cyl9) + select(mtcars2, disp9) + select(mtcars2, gear2)

我尝试过类似的方法,但是它给了我一个数字而不是向量

I tried something like this but it gives me a number instead of a vector

mtcars2 %>% select(matches("[0-9]")) %>% sum

请仅使用dplyr解决方案,因为稍后我需要将这些函数应用于sql表。

Please dplyr solutions only, since i need to apply these functions to a sql table later on.

谢谢!

更新..
我需要解决方案来处理sql表,数据设置如下。.

Update.. I need the solution to work on sql tables, data setup as follow..

mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "mt", mtcars)
mt.sql=tbl(mydb, "mt")
mt.sql = mt.sql %>% mutate(cyl9=cyl, disp9=disp, gear2=gear)

reduce(),rowSums(),rowwise()在sql表上不起作用,我尝试过这些并且他们给了我错误。

reduce(), rowSums(), rowwise() does not work on sql tables, ive tried those and they give me errors.

我尝试过,

mt.sql %>% rowwise()

错误:is.data.frame(data)不是TRUE

Error: is.data.frame(data) is not TRUE

mt.sql %>% select(matches("[0-9]")) %>% mutate(sum=rowSums(.))

UseMethod( escape)中的错误:
没有适用于 c('tbl_dbi','tbl_sql','tbl_lazy','tbl')类的对象的'escape'适用方法

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "c('tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

mt.sql %>% select(matches("[0-9]")) %>% reduce(`+`)

.x + .y错误:二进制运算符的非数字参数

Error in .x + .y : non-numeric argument to binary operator

如果我切换了mt .sql到mtcars2,它们都可以工作,所以我想这是sql表问题。

If i switch mt.sql to mtcars2, they all work, so i guess this is a sql table issue.

推荐答案

考虑到SQL约束阻止使用更简单,更优雅的解决方案,例如 rowSums reduce ,我提供了一个更容易理解的答案,使我们回到了更基本的 new_col = a + b + c + ... + n

Considering that the SQL constraint prevents use of more simple and elegant solutions such as rowSums and reduce, I offer a more hack-y answer that brings us back to the more basic new_col = a + b + c + ... + n

library(dplyr)
library(stringr)

# get the variable names and form a text equation
col_eqn <- paste0(str_subset(colnames(mtcars), "[a-z]", collapse = " + ")

# run a normal  mutate function parsing and evaluating the equation
mtcars %>% mutate(new_col = eval(parse(text = col_eqn)))

# mpg cyl  disp  hp drat    wt  qsec vs am gear carb new_col
# 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 328.980
# 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 329.795
# 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 259.580
# 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 426.135
# 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 590.310
# 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 385.540
# 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 656.920
# 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 270.980
# 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 299.570
# 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 350.460

这篇关于dplyr-使用正则表达式的多个列的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆