根据横列函数过滤行(dplyr) [英] Filter rows according to rowwise function (dplyr)

查看:121
本文介绍了根据横列函数过滤行(dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您可以帮助我在下面的最后一个命令中进行过滤,使用 dplyr 而不是 apply



我正在尝试解决发布的问题这里

 库(gtools)
n < - 8
dt< - 排列(n + 1,6,v = 0:n,repeats.allowed = TRUE)

SmplMode< - function(x){
tabSmpl< - tabulate(x)
SmplMode< - ((tabSmpl == max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))> 1)
SmplMode< - 0
return(SmplMode)
}

res < - dt [apply(dt,1,function(x){
y < - rep(c(1,2, 3,4,5,6),c(x [1],x [2],x [3],x [4],x [5],x [6]))
return(mean y)== 3& diff(range(y))== 4& median(y)== 3.5& SmplMode(y)== 4)
}),]


解决方案

使用 rowwise 慢,所以从<$ c的逐行操作的帮助中过滤掉 SmplMode(y),mean(y),diff(range(y)) $ c> matrixStats 包年龄加快了事情。以下在我的笔记本电脑上运行大约0.4秒,而您的原始解决方案和@ shadow的解决方案运行大约70秒。

 库(dplyr)
库(matrixStats)

df< - 数据。 frame(dt)

df $ m< - rowMaxs(dt)#for SmplMode(y)
S< - 矩阵(1:6,ncol = ncol(dt) = nrow(dt),byrow = T)
Z <-S *(dt!= 0)
Z [Z == 0] < - NA
df $范围& - rowMaxs(Z,na.rm = TRUE)-rowMins(Z,na.rm = TRUE)#for diff(rang(y))
df $ Mean< - rowSums(S * dt)/ rowSums dt)#for mean(y)

res < - df%>%
filter(X4 == m,(X1 == m)+(X2 == m)+ (X3 == m)+(X4 == m)+(X5 == m)+(X6 == m)== 1,
Range == 4,#range condition here
Mean = = 3)%>%#mean condition here
rowwise()%>%
mutate(Med = median(rep(c(1,2,3,4,5,6),c (X1,X2,X3,X4,X5,X6))))%>%
filter(Med == 3.5)%>%#median condition here
select(-m,-Range ,-Mean,-Med)%>%#摆脱新郎
as.matrix


Could you please help me do the filtering in the last command below, using dplyr instead of apply?

I was trying to solve the problem posted here

library(gtools)
n <- 8
dt <- permutations(n+1,6,v=0:n,repeats.allowed=TRUE)

SmplMode <- function(x) {
  tabSmpl <- tabulate(x)
  SmplMode <- which(tabSmpl == max(tabSmpl))
  if (sum(tabSmpl == max(tabSmpl)) > 1)
    SmplMode <- 0
  return(SmplMode)
}

res <- dt[apply(dt,1,function(x) {
  y <- rep(c(1,2,3,4,5,6),c(x[1],x[2],x[3],x[4],x[5],x[6]))
  return(mean(y)==3 & diff(range(y))==4 & median(y)==3.5 & SmplMode(y)==4)
  }),]

解决方案

Operations with rowwise is slow, so filtering out SmplMode(y), mean(y), diff(range(y)) conditions early on with the help of row-wise operations from matrixStats package speeds the things up nicely. Following runs about 0.4 sec on my laptop, while both your original solution and @shadow's solution runs about 70secs.

library(dplyr)
library(matrixStats)

df <- data.frame(dt)

df$m <- rowMaxs(dt)                                       #for SmplMode(y)  
S <- matrix(1:6, ncol=ncol(dt), nrow=nrow(dt), byrow=T)
Z <- S*(dt!=0)
Z[Z==0] <- NA
df$Range <- rowMaxs(Z, na.rm=TRUE)-rowMins(Z, na.rm=TRUE) #for diff(rang(y))
df$Mean <- rowSums(S*dt)/rowSums(dt)                      #for mean(y)

res <- df %>% 
  filter(X4  == m, (X1==m)+(X2==m)+(X3==m)+(X4==m)+(X5==m)+(X6==m)==1, 
         Range == 4, # range condition here
         Mean == 3) %>% #mean condition here
  rowwise() %>% 
  mutate(Med = median(rep(c(1,2,3,4,5,6), c(X1, X2, X3, X4, X5, X6)))) %>%
  filter(Med == 3.5) %>%   #median condition here 
  select(-m, -Range, -Mean, -Med) %>% # get rid of newcols
  as.matrix 

这篇关于根据横列函数过滤行(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆