使用 dplyr 对多列求和 [英] Sum across multiple columns with dplyr

查看:29
本文介绍了使用 dplyr 对多列求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题涉及对数据框的多列中的值求和,并使用 dplyr 创建与此求和相对应的新列.列中的数据条目是 binary(0,1).我正在考虑 dplyrsummarise_eachmutate_each 函数的逐行模拟.以下是数据框的最小示例:

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the summarise_each or mutate_each function of dplyr. Below is a minimal example of the data frame:

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

> df
   x1 x2 x3 x4 x5
1   1  1  0  1  1
2   0  1  1  0  1
3   0 NA  0 NA NA
4  NA  1  1  1  1
5   0  1  1  0  1
6   1  0  0  0  1
7   1 NA NA NA NA
8  NA NA NA  0  1
9   0  0  0  0  0
10  1  1  1  1  1

我可以使用类似的东西:

I could use something like:

df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)

但这将涉及写出每一列的名称.我有 50 列.此外,列名在我想要实现的循环的不同迭代中发生变化操作,所以我想尽量避免提供任何列名.

but this would involve writing out the names of each of the columns. I have like 50 columns. In addition, the column names change at different iterations of the loop in which I want to implement this operation so I would like to try avoid having to give any column names.

我怎样才能最有效地做到这一点?任何帮助将不胜感激.

How can I do that most efficiently? Any assistance would be greatly appreciated.

推荐答案

dplyr >= 1.0.0 using cross

使用 rowSums 对每一行求和(rowwisecode> 适用于任何聚合,但速度较慢)

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(across(where(is.numeric))))

总结每一列

df %>%
   summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr <1.0.0

总结每一行

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(.[1:5]))

使用 superseed 总结每一列>summarise_all:

sum down each column using superseeded summarise_all:

df %>%
   replace(is.na(.), 0) %>%
   summarise_all(funs(sum))

这篇关于使用 dplyr 对多列求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆