在没有for循环的情况下在另一个df上创建data.frame条件 [英] Create data.frame conditional on another df without for loop
问题描述
我试图创建一个data.frame,它取决于参考data.frame的值取不同的值。我只知道如何做一个for循环,但已被建议避免循环在R ...我的实际数据有〜500,000行×200列。
$ b (rbinom(10,1,0.5),5,2,dimnames = list(c(1:5),...,c))。 c(a,b))))
b < - data.frame(v1 = c(2,10,12,5,11,3,4,14,2,13),v2 = c(a,b,b,a,b,a,a,b,a,b))
c< - as.data.frame(matrix(0,5,2))
for(i in 1:5){
for(j in 1:2){
if(a [i,j] == 1){
c [i,j] < - mean(b $ v1 [b $ v2 == colnames(a)[j]])
} else {
c [i,j] = mean(b $ v1)
}}}
c
基于每个单元格中的值以及data.framea的相应列名创建data.framec。
是否有另一种方法来做到这一点?索引?使用data.table?也许应用功能?
任何和所有的帮助,非常感谢!
(a == 0)* mean(b $ v1)+ t(t(a)* c(tapply(b $ v1,b $ v2,mean)))
运行以了解发生了什么。此外,请注意,这假设在 a
中有序的名称(以及0和1作为条目,按照OP)
上面的 t
的一个替代方法是使用 mapply
(假定 a
是 data.frame
或 data.table
而不是 matrix
,而上面不在乎):
$ $ $ $ $ $ $ 0)* mean(b $ v1)+ mapply(`*`,a,tapply(b $ v1,b $ v2,mean))
I'm trying to create a data.frame that takes different values depending on the value of a reference data.frame. I only know how to do this with a "for loop", but have been advised to avoid for loops in R... and my actual data have ~500,000 rows x ~200 columns.
a <- as.data.frame(matrix(rbinom(10,1,0.5),5,2,dimnames=list(c(1:5),c("a","b"))))
b <- data.frame(v1=c(2,10,12,5,11,3,4,14,2,13),v2=c("a","b","b","a","b","a","a","b","a","b"))
c <- as.data.frame(matrix(0,5,2))
for (i in 1:5){
for(j in 1:2){
if(a[i,j]==1){
c[i,j] <- mean(b$v1[b$v2==colnames(a)[j]])
} else {
c[i,j]= mean(b$v1)
}}}
c
I create data.frame "c" based on the value in each cell, and the corresponding column name, of data.frame "a". Is there another way to do this? Indexing? Using data.table? Maybe apply functions? Any and all help is greatly appreciated!
(a == 0) * mean(b$v1) + t(t(a) * c(tapply(b$v1, b$v2, mean)))
Run in pieces to understand what's happening. Also, note that this assumes ordered names in a
(and 0's and 1's as entries in it, as per OP).
An alternative to a bunch of t
's as above is using mapply
(this assumes a
is a data.frame
or data.table
and not a matrix
, while the above doesn't care):
(a == 0) * mean(b$v1) + mapply(`*`, a, tapply(b$v1, b$v2, mean))
这篇关于在没有for循环的情况下在另一个df上创建data.frame条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!