汇总所有列 [英] Summarise over all columns

查看:11
本文介绍了汇总所有列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的数据:

gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())

我现在想将行中所有元素的总和附加到每一行(我的实际函数更复杂,但 sum 说明了问题).

I would now like to attach, to each row, the total sum of all the elements in the row (my actual function is more complex but sum illustrates the problem).

没有 dplyr,我会写

Without dplyr, I’d write

cbind(x, Sum = apply(x, 1, sum))

导致:

   A C  G T Sum
1  3 1  6 9  19
2  3 4  3 3  13
3  3 1 10 5  19
4  7 2  1 6  16
…

但是用 dplyr 做到这一点似乎出奇地困难.

But it seems surprisingly hard to do this with dplyr.

我试过了

x %>% rowwise() %>% mutate(Sum = sum(A : T))

但结果不是每一行的列的总和,这是出乎意料的,(对我来说)莫名其妙.

But the result is not the sum of the columns of each row, it’s something unexpected and (to me) inexplicable.

我也试过

x %>% rowwise() %>% mutate(Sum = sum(.))

但在这里,. 只是整体 x 的占位符.不出所料,提供 no 参数也不起作用(结果都是 0).不用说,如果没有 rowwise(),这些变体都不起作用.

But here, . is simply a placeholder for the whole x. Providing no argument does, unsurprisingly, also not work (results are all 0). Needless to say, none of these variants works without rowwise(), either.

(实际上没有任何理由必须在 dplyr 中执行此操作,但是 (a) 我希望使我的代码尽可能统一,并且在不同的 API 之间跳转无济于事;并且 (b) 我我希望有一天能在 dplyr 中自动和免费地并行化这些命令.)

(There isn’t really any reason to necessarily do this in dplyr, but (a) I’d like to keep my code as uniform as possible, and jumping between different APIs doesn’t help; and (b) I’m hoping to one day get automatic and free parallelisation of such commands in dplyr.)

推荐答案

我曾经做过类似的事情,到那时我得到了:

I once did something similar, and by that time I ended up with:

x %>%
  rowwise() %>%
  do(data.frame(., res = sum(unlist(.))))
#    A  C G  T res
# 1  3  2 8  6  19
# 2  6  1 7 10  24
# 3  4  8 6  7  25
# 4  6  4 7  8  25
# 5  6 10 7  2  25
# 6  7  1 2  2  12
# 7  5  4 8  5  22
# 8  9  2 3  2  16
# 9  3  4 7  6  20
# 10 7  5 3  9  24

<小时>

也许你的更复杂的函数在没有 unlist 的情况下也能正常工作,但对于 sum 来说似乎是必要的.因为 . 指的是当前组",我最初认为 . 用于例如rowwise 机制中的第一行对应于 x[1, ],这是一个列表,sum 之外愉快地吞下做


Perhaps your more complex function works fine without unlist, but it seems like it is necessary for sum. Because . refers to the "current group", I initially thought that . for e.g. the first row in the rowwise machinery would correspond to x[1, ], which is a list, which sum swallows happily outside do

is.list((x[1, ]))
# [1] TRUE

sum(x[1, ])
# [1] 19 

但是,如果 do 中没有 unlist 会产生错误,我不知道为什么:

However, without unlist in do an error is generated, and I am not sure why:

x %>%
  rowwise() %>%
  do(data.frame(., res = sum(.)))
# Error in sum(.) : invalid 'type' (list) of argument

这篇关于汇总所有列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆