使用 dplyr 对多列求和 [英] Sum across multiple columns with dplyr
问题描述
我的问题涉及对数据框的多列中的值求和,并使用 dplyr
创建与此求和相对应的新列.列中的数据条目是 binary(0,1).我正在考虑 dplyr
的 summarise_each
或 mutate_each
函数的逐行模拟.以下是数据框的最小示例:
My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr
. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the summarise_each
or mutate_each
function of dplyr
. Below is a minimal example of the data frame:
library(dplyr)
df=data.frame(
x1=c(1,0,0,NA,0,1,1,NA,0,1),
x2=c(1,1,NA,1,1,0,NA,NA,0,1),
x3=c(0,1,0,1,1,0,NA,NA,0,1),
x4=c(1,0,NA,1,0,0,NA,0,0,1),
x5=c(1,1,NA,1,1,1,NA,1,0,1))
> df
x1 x2 x3 x4 x5
1 1 1 0 1 1
2 0 1 1 0 1
3 0 NA 0 NA NA
4 NA 1 1 1 1
5 0 1 1 0 1
6 1 0 0 0 1
7 1 NA NA NA NA
8 NA NA NA 0 1
9 0 0 0 0 0
10 1 1 1 1 1
我可以使用类似的东西:
I could use something like:
df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)
但这将涉及写出每一列的名称.我有 50 列.此外,列名在我想要实现的循环的不同迭代中发生变化操作,所以我想尽量避免提供任何列名.
but this would involve writing out the names of each of the columns. I have like 50 columns. In addition, the column names change at different iterations of the loop in which I want to implement this operation so I would like to try avoid having to give any column names.
我怎样才能最有效地做到这一点?任何帮助将不胜感激.
How can I do that most efficiently? Any assistance would be greatly appreciated.
推荐答案
dplyr >= 1.0.0 using cross
使用 rowSums
对每一行求和(rowwise
code> 适用于任何聚合,但速度较慢)
dplyr >= 1.0.0 using across
sum up each row using rowSums
(rowwise
works for any aggreation, but is slower)
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))
总结每一列
df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))
dplyr <1.0.0
总结每一行
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))
使用 superseed 总结每一列>summarise_all:
sum down each column using superseeded summarise_all
:
df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))
这篇关于使用 dplyr 对多列求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!