汇总所有列 [英] Summarise over all columns
问题描述
我有以下格式的数据:
gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())
我现在想将行中所有元素的总和附加到每一行(我的实际函数更复杂,但 sum
说明了问题).
I would now like to attach, to each row, the total sum of all the elements in the row (my actual function is more complex but sum
illustrates the problem).
没有 dplyr,我会写
Without dplyr, I’d write
cbind(x, Sum = apply(x, 1, sum))
导致:
A C G T Sum
1 3 1 6 9 19
2 3 4 3 3 13
3 3 1 10 5 19
4 7 2 1 6 16
…
但是用 dplyr 做到这一点似乎出奇地困难.
But it seems surprisingly hard to do this with dplyr.
我试过了
x %>% rowwise() %>% mutate(Sum = sum(A : T))
但结果不是每一行的列的总和,这是出乎意料的,(对我来说)莫名其妙.
But the result is not the sum of the columns of each row, it’s something unexpected and (to me) inexplicable.
我也试过
x %>% rowwise() %>% mutate(Sum = sum(.))
但在这里,.
只是整体 x
的占位符.不出所料,提供 no 参数也不起作用(结果都是 0
).不用说,如果没有 rowwise()
,这些变体都不起作用.
But here, .
is simply a placeholder for the whole x
. Providing no argument does, unsurprisingly, also not work (results are all 0
). Needless to say, none of these variants works without rowwise()
, either.
(实际上没有任何理由必须在 dplyr 中执行此操作,但是 (a) 我希望使我的代码尽可能统一,并且在不同的 API 之间跳转无济于事;并且 (b) 我我希望有一天能在 dplyr 中自动和免费地并行化这些命令.)
(There isn’t really any reason to necessarily do this in dplyr, but (a) I’d like to keep my code as uniform as possible, and jumping between different APIs doesn’t help; and (b) I’m hoping to one day get automatic and free parallelisation of such commands in dplyr.)
推荐答案
我曾经做过类似的事情,到那时我得到了:
I once did something similar, and by that time I ended up with:
x %>%
rowwise() %>%
do(data.frame(., res = sum(unlist(.))))
# A C G T res
# 1 3 2 8 6 19
# 2 6 1 7 10 24
# 3 4 8 6 7 25
# 4 6 4 7 8 25
# 5 6 10 7 2 25
# 6 7 1 2 2 12
# 7 5 4 8 5 22
# 8 9 2 3 2 16
# 9 3 4 7 6 20
# 10 7 5 3 9 24
<小时>
也许你的更复杂的函数在没有 unlist
的情况下也能正常工作,但对于 sum
来说似乎是必要的.因为 .
指的是当前组",我最初认为 .
用于例如rowwise
机制中的第一行对应于 x[1, ]
,这是一个列表,sum
在 之外愉快地吞下做
Perhaps your more complex function works fine without unlist
, but it seems like it is necessary for sum
. Because .
refers to the "current group", I initially thought that .
for e.g. the first row in the rowwise
machinery would correspond to x[1, ]
, which is a list, which sum
swallows happily outside do
is.list((x[1, ]))
# [1] TRUE
sum(x[1, ])
# [1] 19
但是,如果 do
中没有 unlist
会产生错误,我不知道为什么:
However, without unlist
in do
an error is generated, and I am not sure why:
x %>%
rowwise() %>%
do(data.frame(., res = sum(.)))
# Error in sum(.) : invalid 'type' (list) of argument
这篇关于汇总所有列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!