使用ddply合并基于列的二进制数据行 [英] Merging rows of binary data based on columns using ddply
问题描述
我有以下数据框,我要针对这些数据框将一定数量的行中的二进制值合并在一起.
I have the following dataframe for which I want merge together binary values from an amount of rows.
df =data.frame(ID=c(rep("A",5),rep("B",5)), nr=c(rep("2",5),rep("3",5)), replicate(10,sample(0:1,10,rep=TRUE)))
eg:
# ID nr X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# A 2 0 0 1 1 1 1 1 1 1 0
# A 2 1 0 0 0 0 0 0 1 0 1
# A 2 0 0 1 1 1 0 0 0 0 1
# A 2 0 0 0 0 0 1 1 1 0 1
# A 2 0 0 0 1 0 1 1 0 1 1
# B 3 0 1 0 0 1 0 0 0 1 1
# B 3 1 1 0 0 0 0 0 0 0 1
# B 3 1 0 1 0 0 0 1 1 0 1
# B 3 1 1 1 0 1 0 0 1 1 1
# B 3 0 0 0 1 0 0 0 1 0 1
现在,在这种情况下,我想合并前2列的行:
Now I want to merge rows for the first 2 columns in this case:
df2 = ddply(df, c(1:2), summarise, numcolwise(sum,c(3:12)))
但是出现以下错误:
Error in vector(type, length) :
vector: cannot make a vector of mode 'closure'.
我还希望将大于1的任何值重置为1,以使其保持二进制状态,但是由于我无法克服错误,所以我还没有尝试过.
Also I would want that anything higher than 1 to be reset to 1 to keep it binary, but since I couldn't get past the error I haven't tried it yet.
我知道之前曾有人问过这个问题的变体,但我以前从未发现过这样的问题.请记住,我要使用列索引,因为我正在处理大数据.
I know variations of this question have been asked before but I haven't found it like this before. Keep in mind that I want to use column indices because I'm working with large data.
推荐答案
如果您的数据很大(如注释中所述),请忽略plyr
,请尝试data.table
If your data is quite large (as mentioned in comments), forget about plyr
, try data.table
library(data.table)
setDT(df)[, lapply(.SD, sum), by = list(ID, nr)]
## ID nr X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## 1: A 2 2 3 5 2 5 2 1 3 4 1
## 2: B 3 3 3 4 1 3 2 3 2 1 4
或者,如果您想坚持使用plyr
系列,请继续使用下一代产品:dplyr
Or if you want to stick with the plyr
family, move on to the next generation: dplyr
library(dplyr)
df %>%
group_by(ID, nr) %>%
summarise_each(funs(sum))
# Source: local data table [2 x 12]
# Groups: ID
#
# ID nr X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 A 2 2 3 5 2 5 2 1 3 4 1
# 2 B 3 3 3 4 1 3 2 3 2 1 4
这篇关于使用ddply合并基于列的二进制数据行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!