在列的子集上使用逐行更改 [英] Using mutate rowwise over a subset of columns

查看:73
本文介绍了在列的子集上使用逐行更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个新列,其中将包含对小标题的列子集按行进行的计算结果, 并将此新列添加到现有小标题中。像这样:

I am trying to create a new column that will contain a result of calculations done rowwise over a subset of columns of a tibble, and add this new column to the existing tibble. Like so:

df <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3)
)

我实际上想从基数R进行此代码的dplyr等效操作:

I effectively want to do a dplyr equivalent of this code from base R:

df$SumA <- rowSums(df[,grepl("^A", colnames(df))])

我的问题是这不起作用:

My problem is that this doesn't work:

df %>% 
select(starts_with("A")) %>% 
mutate(SumA = rowSums(.))
    # some code here

...因为我摆脱了 ID列以便让突变运行其他(数字)列上的rowSums。我试图在突变后在管道中绑定或bind_cols,但是它不起作用。 mutate的任何变体都不能起作用,因为它们是就地起作用的(在小节的每个像元内,而不是跨列,即使是按行也不行)。

...because I got rid of the "ID" column in order to let mutate run the rowSums over the other (numerical) columns. I have tried to cbind or bind_cols in the pipe after the mutate, but it doesn't work. None of the variants of mutate work, because they work in-place (within each cell of the tibble, and not across the columns, even with rowwise).

可以,但并不能给我一个优雅的解决方案:

This does work, but doesn't strike me as an elegant solution:

df %>% 
mutate(SumA = rowSums(.[,grepl("^A", colnames(df))]))

是有没有基于tidyverse的解决方案,不需要grepl或方括号,而只需要更多标准的dplyr动词和参数?

Is there any tidyverse-based solution that does not require grepl or square brackets but only more standard dplyr verbs and parameters?

我的预期输出是:

df_out <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3),
SumA = c(6, 6, 6)
)

最佳
kJ

Best kJ

推荐答案

这是在 tidyverse 中使用 purrr :: pmap 。最好与实际上需要逐行运行的函数配合使用;简单添加可能会以更快的方式完成。基本上我们使用 select 将输入列表提供给 pmap ,这使我们可以使用 select 助手,例如 starts_with 匹配项(如果需要正则表达式)。

Here's one way to approach row-wise computation in the tidyverse using purrr::pmap. This is best used with functions that actually need to be run row by row; simple addition could probably be done a faster way. Basically we use select to provide the input list to pmap, which lets us use the select helpers such as starts_with or matches if you need regex.

library(tidyverse)
df <- tibble(
  ID = c("one", "two", "three"),
  A1 = c(1, 1, 1),
  A2 = c(2, 2, 2),
  A3 = c(3, 3, 3)
)

df %>%
  mutate(
    SumA = pmap_dbl(
      .l = select(., starts_with("A")),
      .f = function(...) sum(...)
    )
  )
#> # A tibble: 3 x 5
#>   ID       A1    A2    A3  SumA
#>   <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 one       1     2     3     6
#> 2 two       1     2     3     6
#> 3 three     1     2     3     6

reprex包(v0.2.1)

Created on 2019-01-30 by the reprex package (v0.2.1)

这篇关于在列的子集上使用逐行更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆