使用 dplyr 对多列求和 [英] Sum across multiple columns with dplyr

查看：29 发布时间：2021/12/23 12:09:06 r dplyr

本文介绍了使用 dplyr 对多列求和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题涉及对数据框的多列中的值求和，并使用 dplyr 创建与此求和相对应的新列.列中的数据条目是 binary(0,1).我正在考虑 dplyr 的 summarise_each 或 mutate_each 函数的逐行模拟.以下是数据框的最小示例:

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the summarise_each or mutate_each function of dplyr. Below is a minimal example of the data frame:

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

> df
   x1 x2 x3 x4 x5
1   1  1  0  1  1
2   0  1  1  0  1
3   0 NA  0 NA NA
4  NA  1  1  1  1
5   0  1  1  0  1
6   1  0  0  0  1
7   1 NA NA NA NA
8  NA NA NA  0  1
9   0  0  0  0  0
10  1  1  1  1  1

我可以使用类似的东西:

I could use something like:

df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)

但这将涉及写出每一列的名称.我有 50 列.此外，列名在我想要实现的循环的不同迭代中发生变化操作，所以我想尽量避免提供任何列名.

but this would involve writing out the names of each of the columns. I have like 50 columns. In addition, the column names change at different iterations of the loop in which I want to implement this operation so I would like to try avoid having to give any column names.

我怎样才能最有效地做到这一点?任何帮助将不胜感激.

How can I do that most efficiently? Any assistance would be greatly appreciated.

dplyr >= 1.0.0 using cross

使用 rowSums 对每一行求和(rowwisecode> 适用于任何聚合，但速度较慢)

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(across(where(is.numeric))))

总结每一列

df %>%
   summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr <1.0.0

总结每一行

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(.[1:5]))

使用 superseed 总结每一列>summarise_all:

sum down each column using superseeded summarise_all:

df %>%
   replace(is.na(.), 0) %>%
   summarise_all(funs(sum))

这篇关于使用 dplyr 对多列求和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 dplyr 对多列求和 [英] Sum across multiple columns with dplyr

问题描述

推荐答案

dplyr >= 1.0.0 using cross

dplyr >= 1.0.0 using across

dplyr <1.0.0

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 dplyr 对多列求和 [英] Sum across multiple columns with dplyr

问题描述

推荐答案

dplyr >= 1.0.0 using cross

dplyr >= 1.0.0 using across

dplyr <1.0.0

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭