在数据帧中查找一系列规则,其中包含中断规则 [英] find a sequence of rules in a dataframe, with break rules

查看:16
本文介绍了在数据帧中查找一系列规则,其中包含中断规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我展示了如何实现此算法,我将其分为两个步骤

第一步顺序搜索

第二步检查违规规则

set.seed(123)
dat <- as.data.frame(matrix(sample(10,60,replace = T),ncol = 3))
colnames(dat) <- LETTERS[1:ncol(dat)]
dat

rule <- c("A==0","A==10 & B==4","C==9","A>10","B<0","C==0","A==5","A>10",
          "B<0","C==0","A==9 & B==9","A>10","B<0","A==10","A==7 & B==5")
action <- c("break","next","next",rep("break",3),"next",rep("break",3),
            "next",rep("break",3) ,"next")

rule <- cbind(rule,action)

推荐答案

我想对所有试图帮助我的人说一句非常感谢的话,以及你们无限的耐心。 但这是不可能帮助我的,因为我自己并不完全明白我想要什么。我没有把问题分成几个部分,然后分开问(应该是这样),而是问了一个我几乎无法向自己解释的大问题。

对此我非常非常抱歉。 这就是我的答案,这就是我最终想要得到的。

   seq_rule2 <- function(dat , rule ,res.only = TRUE){
  
  # This is a fast function written by Thomas here
  # https://stackoverflow.com/questions/68625542/match-all-logic-rules-with-a-dataframe-need-super-fast-function
  # as an answer to my earlier question. 
  # It takes the rules as a vector and looks for the sequence
  
  
  seq_rule <- function(dat, rule, res.only = TRUE) {
    m <- with(dat, lapply(rule, function(r) eval(str2expression(r))))
    fu <-  function(x, y) {
      k <- which(y)
      ifelse(all(k <= x), NA, min(k[k > x]))
    }
    idx <- na.omit(Reduce( fu, m,init = 0, accumulate = TRUE ))[-1]
    if (!res.only) {
      fidx <- head(idx, length(rule))
      debug.vec <- replace(rep("no", nrow(dat)), fidx, rule[seq_along(fidx)])
      return(cbind(dat, debug.vec))
    }
    length(idx) >= length(rule)
  }
  
  
  
  #if there is only one next rule, then there is no point in continuing to return the FALSE and finish completely
  if(  length(rule$rule[rule$action=="next"]) <= 1  )  return(FALSE)
  
  # STEP 1  
  # run seq_rule  
  yes.next.rule.seq <- seq_rule(dat = dat , rule = rule$rule[rule$action=="next"] , res.only = T)
  
  if(res.only==FALSE & yes.next.rule.seq==FALSE) {
    Next <- rep("no",nrow(dat)) 
    Break <- rep("no",nrow(dat)) 
    dat <- cbind(dat,Next=Next, Break=Break)
    return(dat)
  }
  if(res.only==TRUE & yes.next.rule.seq==FALSE)  return(FALSE)
  
  
  # if the seq_rule found the sequence (TRUE) but there are no "break rules" in the "rule",
  # then there is no point in searching for "break rules". Return TRUE and finish completely
  if( length(rule$rule[rule$action=="break"]) == 0  &  yes.next.rule.seq == TRUE) return(TRUE)
  
  # STEP 2
  #looking for break rules in the range between next rules
  
  if(yes.next.rule.seq){
    
    
    #get indices where the "next rules"  triggered in dat  
    deb.vec <- seq_rule(dat = dat , rule = rule$rule[rule$action=="next"] , res.only = F)[,"debug.vec"]
    idx.next.rules <- which(deb.vec!="no")
    
    
    #get indices where the "break rules"  triggered in dat  
    m <- with(dat, lapply(rule$rule[rule$action=="break"], function(r) eval(str2expression(r))))
    idx.break.rules <- unlist(lapply(m,which))
    
    
    # RES the final result is equal to TRUE, 
    # but if a "break rule" is found between the "next rules", 
    # then the RES will be false
    RES <- TRUE  
    
    
    # sliding window of two "next rules"  http://prntscr.com/1qhnzae
    for(i in 2:length(idx.next.rules)){
      temp.range <- idx.next.rules[  (i-1):i  ]
      # Check if there is any "break rule" index between the "next rule" indexes
      break.detect <- any(  idx.break.rules > temp.range[1]   &  idx.break.rules < temp.range[2] )
      if( break.detect )   RES <- FALSE ; break
    }
    
  }
  
  
  if(!res.only) {
    Next <- rep("no",nrow(dat)) ; Next[idx.next.rules] <- "yes"
    Break <- rep("no",nrow(dat)) ; Break[idx.break.rules] <- "yes"
    dat <- cbind(dat,Next=Next, Break=Break)
    return(dat)
  }
  return(RES)
}

要检查的数据

set.seed(963)
dat <- as.data.frame(matrix(sample(10,30,replace = T),ncol = 3))
colnames(dat) <- LETTERS[1:ncol(dat)]
rule <- cbind.data.frame(rule= c("A==9","B==4","C==4","A==4") ,
                         action= c("next","break","break","next"))
rule <- as.data.frame(rule,stringsAsFactors = F)
seq_rule2(dat = dat, rule = rule)
dat
rule

例如无中断set.seed(963) http://prntscr.com/1qhprxq

带中断set.seed(930)http://prntscr.com/1qhpv2h

这篇关于在数据帧中查找一系列规则,其中包含中断规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆