间隔在R中设置代数(联合,交集,差,包含,...) [英] Interval sets algebra in R (union, intersection, differences, inclusion, ...)

查看:78
本文介绍了间隔在R中设置代数(联合,交集,差,包含,...)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道R中是否存在用于间隔操纵和比较的适当框架.

I am wondering whether a proper framework for interval manipulation and comparison does exist in R.

经过一些搜索,我只能找到以下内容: -基本Package中的函数findInterval. (但我很难理解) -有关联合和相交的一些答案(特别是:

After some search, I was only able to find the following: - function findInterval in base Package. (but I hardly understand it) - some answers here and there about union and intersection (notably: http://r.789695.n4.nabble.com/Union-Intersect-two-continuous-sets-td4224545.html)

您是否知道实施一套全面的工具以轻松处理时间间隔操纵中的频繁任务的计划,例如include/setdiff/union/intersection/etc. (例如,请参见此处以获取功能列表)? 或您对开发这种方法有何建议?

Would you know of an initiative to implement a comprehensive set of tools to easily handles frequent tasks in interval manipulation, like inclusion/setdiff/union/intersection/etc. (eg see here for a list of functionalities)? or would you have advice in developing such an approach?

以下是我这一边的一些草稿.它肯定很笨拙,并且仍然存在一些错误,但可能可以说明我在寻找什么.

below are some drafts on my side for doing so. it is surely awkward and still has some bugs but it might illustrate what I am looking for.

有关所采取选项的初步方面 -应无缝处理间隔或设置的间隔 -间隔表示为2列data.frames(较低边界,较高边界),位于一行上 -间隔集表示为2列,其中有几行 -可能需要第三列来标识时间间隔集

preliminary aspects about the options taken - should deal seamlessly with intervals or intervals set - intervals are represented as 2 columns data.frames (lower boundary, higher boundary), on one row - intervals sets are represented as 2 columns with several rows - a third column might be needed for identification of intervals sets

UNION

    interval_union <- function(df){   # for data frame

    df <- interval_clean(df)
    if(is.empty(df)){
        return(as.data.frame(NULL))
    } else {

        if(is.POSIXct(df[,1])) {
            dated <- TRUE
            df <- colwise(as.numeric)(df)
        } else {
            dated <- FALSE
        }
        M <- as.matrix(df)

        o <- order(c(M[, 1], M[, 2])) 
        n <- cumsum( rep(c(1, -1), each=nrow(M))[o]) 
        startPos <- c(TRUE, n[-1]==1 & n[-length(n)]==0) 
        endPos <- c(FALSE, n[-1]==0 & n[-length(n)]==1) 

        M <- M[o] 

        if(dated == TRUE) {
            df2 <- colwise(mkDateTime)(as.data.frame(cbind(M[startPos], M[endPos])), from.s = TRUE)
        } else {
            df2 <- as.data.frame(cbind(M[startPos], M[endPos]))
        }
        colnames(df2) <- colnames(df)

        # print(df2)
        return(df2)

    }


}


union_1_1 <- function(test, ref){
    names(ref) <- names(test)
    tmp <- interval_union(as.data.frame(rbind(test, ref)))
    return(tmp)
}


union_1_n <- function(test, ref){
    return(union_1_1(test, ref))
}


union_n_n <- function(test, ref){
    testnn <- adply(.data = test, 1, union_1_n, ref, .expand = FALSE)
    return(testnn)
}

ref_interval_union <- function(df, ref){

    tmp0 <- adply(df, 1, union_1_1, ref, .expand = FALSE) # set to FALSE to keep ID
    return(tmp0)                
}

交叉点

interval_intersect <- function(df){
    # adapted from : http://r.789695.n4.nabble.com/Union-Intersect-two-continuous-sets-td4224545.html
    M <- as.matrix(df)

    L <- max(M[, 1])
    R <- min(M[, 2]) 

    Inew <- if (L <= R) c(L, R) else c() 

    if (!is.empty(Inew)){
        df2 <- t(as.data.frame(Inew)) 
        colnames(df2) <- colnames(df)
        rownames(df2) <- NULL
    } else {
        df2 <- NULL
    }

    return(as.data.frame(df2))

}



ref_interval_intersect <- function(df, ref){

    tmpfun <- function(a, b){

        names(b) <- names(a)
        tmp <- interval_intersect(as.data.frame(rbind(a, b)))
        return(tmp)
    }

    tmp0 <- adply(df, 1, tmpfun, ref, .expand = FALSE) # [,3:4]
    #if(!is.empty(tmp0)) colnames(tmp0) <- colnames(df)
    return(tmp0)                
}


int_1_1 <- function(test, ref){

    te <- as.vector(test)
    re <- as.vector(ref)
    names(re) <- names(te)
    tmp0 <- c(max(te[1, 1], re[1, 1]), min(te[1, 2], re[1, 2]))

    if(tmp0[1]>tmp0[2]) tmp0 <- NULL   # inverse of a correct interval --> VOID

    if(!is.empty(tmp0)){
        tmp1 <- colwise(mkDateTime)(as.data.frame(t(as.data.frame(tmp0))))
        colnames(tmp1) <- colnames(test)
    } else {
        tmp1 <- data.frame(NULL)
    }

    return(tmp1)

}


int_1_n <- function(test, ref){

    test1 <- adply(.data = ref, 1, int_1_1, test = test, .expand = FALSE)

    if(is.empty(test1)){
        return(data.frame(NULL))
    } else {

        testn <- interval_union(test1[,2:3])    
        return(testn)
    }

}


int_n_n <- function(test, ref){

    testnn <- adply(.data = test, 1, int_1_n, ref, .expand = FALSE)
    # return(testnn[,2:3])  # return interval set without index (1st column)
    return(testnn)          # return interval set with index (1st column) --> usefull to go with merge to keep metadata going alon g with interval description
}


int_intersect <- function(df, ref){

    mycols <- colnames(df)
    df$X1 <- 1:nrow(df)
    test <- df[, 1:2]
    tmp <- int_n_n(test, ref)

    intersection <- merge(tmp, df, by = "X1", suffixes = c("", "init"))
    return(intersection[,mycols])   

}

排除

excl_1_1 <- function(test, ref){
    te <- as.vector(test)
    re <- as.vector(ref)
    names(re) <- names(te)


    if(te[1] < re[1]){          # Lower Bound
        if(te[2] > re[1]){          # overlap
            x <- unlist(c(te[1], re[1]))
        } else {                    # no overlap
            x <- unlist(c(te[1], te[2]))
        }
    } else {                    # test > ref on lower bound side
        x <- NULL
    }

    if(te[2] > re[2]){          # Upper Bound
        if(te[1] < re[2]){          # overlap
            y <- unlist(c(re[2], te[2]))    
        } else {                    # no overlap
            y <- unlist(c(te[1], te[2]))
        }
    } else {                    # test < ref on upper bound side
        y <- NULL
    }

    if(is.empty(x) & is.empty(y)){
        tmp0 <- NULL
        tmp1 <- tmp0
    } else {

        tmp0 <- as.data.frame(rbind(x, y))
        colnames(tmp0) <- colnames(test)
        tmp1 <- interval_union(tmp0)    

    }

    return(tmp1)    

}



excl_1_n <- function(test, ref){


    testn0 <- adply(.data = ref, 1, excl_1_1, test = test, .expand=FALSE)

    # boucle pour intersecter successivement les intervalles sets, pour gérer les intervalles disjoints (identifiés par X1, col1)

    tmp <- range(testn0)
    names(tmp) <- colnames(testn0)[2:3]
    tmp <- as.data.frame(t(tmp))

    for(i in unique(testn0[,1])){
        tmp <- int_n_n(tmp, testn0[testn0[,1]==i, 2:3])
    }
    return(tmp)

}

加入

incl_1_1 <- function(test, ref){
    te <- as.vector(test)
    re <- as.vector(ref)
    if(te[1] >= re[1] & te[2] <= re[2]){ return(TRUE) } else { return(FALSE) }
}


incl_1_n <- function(test, ref){
    testn <- adply(.data = ref, 1, incl_1_1, test = test)
    return(any(testn[,ncol(testn)]))
}

incl_n_n <- function(test, ref){

    testnn <- aaply(.data = test, 1, incl_1_n, ref, .expand = FALSE)
    names(testnn) <- NULL
    return(testnn)
}

flat_incl_n_n <- function(test, ref){

    ref <- interval_union(ref)
    return(incl_n_n(test, ref))

}


# testing for a vector, instead of an interval set
incl_x_1 <- function(x, ref){

    test <- (x>=ref[1,1] & x<ref[1,2])
    return(test)

}

incl_x_n <- function(x, ref){

    test <- any(x>=ref[,1] & x<ref[,2])
    return(test)

}

推荐答案

我认为您也许可以充分利用

I think you might be able to make good use of the many interval-related functions in the sets package.

这是一个小例子,说明程序包对区间构造,交集,集差,并集和补码的支持,以及对区间中包含的测试.这些和许多其他相关功能记录在?interval的帮助页面上.

Here's a small example illustrating the package's support for interval construction, intersection, set difference, union, and complementation, as well as its test for inclusion in an interval. These and many other related functions are documented on the help page for ?interval.

library(sets)
i1 <- interval(1,6)
i2 <- interval(5,10)
i3 <- interval(200,400)
i4 <- interval(202,402)
i5 <- interval_union(interval_intersection(i1,i2), 
                     interval_symdiff(i3,i4))

i5
# [5, 6] U [200, 202) U (400, 402]
interval_complement(i5)
# [-Inf, 5) U (6, 200) U [202, 400] U (402, Inf]

interval_contains_element(i5, 5.5)
# [1] TRUE
interval_contains_element(i5, 201)
# [1] TRUE

如果您的间隔当前被编码为两列data.frame,则可以使用类似mapply()的方式将其转换为sets包所使用的类型的间隔:

If your intervals are currently encoded in a two-column data.frame, you could use something like mapply() to convert them to intervals of the type used by the sets package:

df   <- data.frame(lBound = c(1,5,100), uBound = c(10, 6, 200))
Ints <- with(df, mapply("interval", l=lBound, r=uBound, SIMPLIFY=FALSE))
Ints
# [[1]]
# [1, 10]

# [[2]]
# [5, 6]

# [[3]]
# [100, 200]

这篇关于间隔在R中设置代数(联合,交集,差,包含,...)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆