根据横列函数过滤行(dplyr) [英] Filter rows according to rowwise function (dplyr)
本文介绍了根据横列函数过滤行(dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
dplyr
而不是 apply
? 我正在尝试解决发布的问题这里
库(gtools)
n < - 8
dt< - 排列(n + 1,6,v = 0:n,repeats.allowed = TRUE)
SmplMode< - function(x){
tabSmpl< - tabulate(x)
SmplMode< - ((tabSmpl == max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))> 1)
SmplMode< - 0
return(SmplMode)
}
res < - dt [apply(dt,1,function(x){
y < - rep(c(1,2, 3,4,5,6),c(x [1],x [2],x [3],x [4],x [5],x [6]))
return(mean y)== 3& diff(range(y))== 4& median(y)== 3.5& SmplMode(y)== 4)
}),]
解决方案
使用 rowwise
慢,所以从<$ c的逐行操作的帮助中过滤掉 SmplMode(y),mean(y),diff(range(y))
$ c> matrixStats 包年龄加快了事情。以下在我的笔记本电脑上运行大约0.4秒,而您的原始解决方案和@ shadow的解决方案运行大约70秒。
库(dplyr)
库(matrixStats)
df< - 数据。 frame(dt)
df $ m< - rowMaxs(dt)#for SmplMode(y)
S< - 矩阵(1:6,ncol = ncol(dt) = nrow(dt),byrow = T)
Z <-S *(dt!= 0)
Z [Z == 0] < - NA
df $范围& - rowMaxs(Z,na.rm = TRUE)-rowMins(Z,na.rm = TRUE)#for diff(rang(y))
df $ Mean< - rowSums(S * dt)/ rowSums dt)#for mean(y)
res < - df%>%
filter(X4 == m,(X1 == m)+(X2 == m)+ (X3 == m)+(X4 == m)+(X5 == m)+(X6 == m)== 1,
Range == 4,#range condition here
Mean = = 3)%>%#mean condition here
rowwise()%>%
mutate(Med = median(rep(c(1,2,3,4,5,6),c (X1,X2,X3,X4,X5,X6))))%>%
filter(Med == 3.5)%>%#median condition here
select(-m,-Range ,-Mean,-Med)%>%#摆脱新郎
as.matrix
Could you please help me do the filtering in the last command below, using dplyr
instead of apply
?
I was trying to solve the problem posted here
library(gtools)
n <- 8
dt <- permutations(n+1,6,v=0:n,repeats.allowed=TRUE)
SmplMode <- function(x) {
tabSmpl <- tabulate(x)
SmplMode <- which(tabSmpl == max(tabSmpl))
if (sum(tabSmpl == max(tabSmpl)) > 1)
SmplMode <- 0
return(SmplMode)
}
res <- dt[apply(dt,1,function(x) {
y <- rep(c(1,2,3,4,5,6),c(x[1],x[2],x[3],x[4],x[5],x[6]))
return(mean(y)==3 & diff(range(y))==4 & median(y)==3.5 & SmplMode(y)==4)
}),]
解决方案
Operations with rowwise
is slow, so filtering out SmplMode(y), mean(y), diff(range(y))
conditions early on with the help of row-wise operations from matrixStats
package speeds the things up nicely. Following runs about 0.4 sec on my laptop, while both your original solution and @shadow's solution runs about 70secs.
library(dplyr)
library(matrixStats)
df <- data.frame(dt)
df$m <- rowMaxs(dt) #for SmplMode(y)
S <- matrix(1:6, ncol=ncol(dt), nrow=nrow(dt), byrow=T)
Z <- S*(dt!=0)
Z[Z==0] <- NA
df$Range <- rowMaxs(Z, na.rm=TRUE)-rowMins(Z, na.rm=TRUE) #for diff(rang(y))
df$Mean <- rowSums(S*dt)/rowSums(dt) #for mean(y)
res <- df %>%
filter(X4 == m, (X1==m)+(X2==m)+(X3==m)+(X4==m)+(X5==m)+(X6==m)==1,
Range == 4, # range condition here
Mean == 3) %>% #mean condition here
rowwise() %>%
mutate(Med = median(rep(c(1,2,3,4,5,6), c(X1, X2, X3, X4, X5, X6)))) %>%
filter(Med == 3.5) %>% #median condition here
select(-m, -Range, -Mean, -Med) %>% # get rid of newcols
as.matrix
这篇关于根据横列函数过滤行(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文