使用间隔来分配分类值 [英] using intervals to assign categorical values

查看:180
本文介绍了使用间隔来分配分类值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

获取以下通用数据

  A<  -  c(5,7,11,10,23,30, 24,6)
B <-C(1,2,3,1,2,3,1,2)
C < - data.frame(A,B)

和以下间隔

 库(间隔)
interval1 < - 间隔(
矩阵(
c(
5,15,
15,25,
25, 35,
35,100
),
ncol = 2,byrow = TRUE
),
closed = c(TRUE,FALSE),
type =Z

rownames(interval1)< - c(A,B,C,D)

interval2 <间隔(
矩阵(
c(
0,10,
12,20,
22,30,
30,100

ncol = 2,byrow = TRUE
),
closed = c(TRUE,FALSE),
type =Z

rownames interval2)< - c(P,Q,R,S)

现在我要创建以下输出表





所以A值与两个反相器重叠,我想将所有的数据复制到在下面。
我们还引入数据$ X 这是 interval1 值和 data $ y 这是 interval2 值。
如果数据在任何时间间隔内都不合适,我想从data.frame中删除它。



我不知道$ code> break()函数将更好地用于创建间隔,或者如果 dplyr 函数可用于使重新导出数据行

解决方案

您可以在 data.table中使用 foverlaps

  library(data.table)
C.DT< - data。表(C)
C.DT [,A1:= A]#需要foverlaps,所以我们可以做一个范围搜索

#`D`和`E`是你的间隔矩阵

I1< - data.table(cbind(data.frame(D),idX = LETTERS [1:4],idY = NA))
I2 < - data。表(cbind(data.frame(E),idX = NA,idY = LETTERS [16:19]))

setkey(I1,X1,X2)#设置我们的间隔范围
setkey(I2,X1,X2)

rbind(
foverlaps(C.DT,I1,by.x = c(A,A1),nomatch = 0),#将C.DT $ A中的每个值与I1
fov中的范围匹配(A,B,X = idX,(A,B),(A,B),(A,B) Y = idY)]

产生:



< p $ p> ABXY
1:5 1 A NA
2:5 1 NA P
3:6 2 A NA
4:6 2 NA P
5:7 2 A NA
6:7 2 NA P
7:10 1 A NA
8:10 1 NA P
9:11 3 A NA
10:23 2 B NA
11:23 2 NA R
12:24 1 B NA
13:24 1 NA R
14:30 3 C NA
15:30 3 NA R
16:30 3 NA S

注意,您可以通过修改创建 I1 I2 的步骤轻松更改所获取的NA而不是NA。 / p>

Take the following generic data

A <- c(5,7,11,10,23,30,24,6)
B <- c(1,2,3,1,2,3,1,2)
C <- data.frame(A,B)

and the following intervals

library(intervals)
interval1 <- Intervals(
  matrix(
    c(
      5, 15,
      15, 25,
      25, 35,
      35, 100
    ),
    ncol = 2, byrow = TRUE
  ),
  closed = c( TRUE, FALSE ),
  type = "Z"
)
rownames(interval1) <- c("A","B","C", "D")

interval2 <- Intervals(
  matrix(
    c(
      0, 10,
      12, 20,
      22, 30,
      30, 100
    ),
    ncol = 2, byrow = TRUE
  ),
  closed = c( TRUE, FALSE ),
  type = "Z"
)
rownames(interval2) <- c("P","Q","R", "S")

Now I want to create the following output table

So where the A value overlap the two invervals, I want to 'copy' all the data to a line below. We also introduce data$X which is the interval1 value and data$y which is the interval2 value. Where data does not fit within any of the interval, I want to remove it from the data.frame

I am not sure if the break() function would be better used to create the intervals or if the dplyr function can be used to make the reoccuring data rows

解决方案

You can use foverlaps in data.table:

library(data.table)
C.DT <- data.table(C)
C.DT[, A1:=A] # required for `foverlaps` so we can do a range search

# `D` and `E` are your interval matrices

I1 <- data.table(cbind(data.frame(D), idX=LETTERS[1:4], idY=NA))
I2 <- data.table(cbind(data.frame(E), idX=NA, idY=LETTERS[16:19]))

setkey(I1, X1, X2)  # set the keys on our interval ranges
setkey(I2, X1, X2)

rbind(
  foverlaps(C.DT, I1, by.x=c("A", "A1"), nomatch=0), # match every value in `C.DT$A` to the ranges in `I1` 
  foverlaps(C.DT, I2, by.x=c("A", "A1"), nomatch=0)
)[order(A, B), .(A, B, X=idX, Y=idY)]

Produces:

     A B  X  Y
 1:  5 1  A NA
 2:  5 1 NA  P
 3:  6 2  A NA
 4:  6 2 NA  P
 5:  7 2  A NA
 6:  7 2 NA  P
 7: 10 1  A NA
 8: 10 1 NA  P
 9: 11 3  A NA
10: 23 2  B NA
11: 23 2 NA  R
12: 24 1  B NA
13: 24 1 NA  R
14: 30 3  C NA
15: 30 3 NA  R
16: 30 3 NA  S

Note you can easily change what you get instead of NA, by modifying the steps where I1 and I2 are created.

这篇关于使用间隔来分配分类值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆