使用间隔来分配分类值 [英] using intervals to assign categorical values
问题描述
获取以下通用数据
A< - c(5,7,11,10,23,30, 24,6)
B <-C(1,2,3,1,2,3,1,2)
C < - data.frame(A,B)
和以下间隔
库(间隔)
interval1 < - 间隔(
矩阵(
c(
5,15,
15,25,
25, 35,
35,100
),
ncol = 2,byrow = TRUE
),
closed = c(TRUE,FALSE),
type =Z
)
rownames(interval1)< - c(A,B,C,D)
interval2 <间隔(
矩阵(
c(
0,10,
12,20,
22,30,
30,100
)
ncol = 2,byrow = TRUE
),
closed = c(TRUE,FALSE),
type =Z
)
rownames interval2)< - c(P,Q,R,S)
现在我要创建以下输出表
所以A值与两个反相器重叠,我想将所有的数据复制到在下面。
我们还引入数据$ X
这是 interval1
值和 data $ y
这是 interval2
值。
如果数据在任何时间间隔内都不合适,我想从data.frame中删除它。
我不知道$ code> break()函数将更好地用于创建间隔,或者如果 dplyr
函数可用于使重新导出数据行
您可以在 data.table中使用
: foverlaps
library(data.table)
C.DT< - data。表(C)
C.DT [,A1:= A]#需要foverlaps,所以我们可以做一个范围搜索
#`D`和`E`是你的间隔矩阵
I1< - data.table(cbind(data.frame(D),idX = LETTERS [1:4],idY = NA))
I2 < - data。表(cbind(data.frame(E),idX = NA,idY = LETTERS [16:19]))
setkey(I1,X1,X2)#设置我们的间隔范围
setkey(I2,X1,X2)
rbind(
foverlaps(C.DT,I1,by.x = c(A,A1),nomatch = 0),#将C.DT $ A中的每个值与I1
fov中的范围匹配(A,B,X = idX,(A,B),(A,B),(A,B) Y = idY)]
产生:
< p $ p>
ABXY
1:5 1 A NA
2:5 1 NA P
3:6 2 A NA
4:6 2 NA P
5:7 2 A NA
6:7 2 NA P
7:10 1 A NA
8:10 1 NA P
9:11 3 A NA
10:23 2 B NA
11:23 2 NA R
12:24 1 B NA
13:24 1 NA R
14:30 3 C NA
15:30 3 NA R
16:30 3 NA S
注意,您可以通过修改创建 I1
和 I2
的步骤轻松更改所获取的NA而不是NA。 / p>
Take the following generic data
A <- c(5,7,11,10,23,30,24,6)
B <- c(1,2,3,1,2,3,1,2)
C <- data.frame(A,B)
and the following intervals
library(intervals)
interval1 <- Intervals(
matrix(
c(
5, 15,
15, 25,
25, 35,
35, 100
),
ncol = 2, byrow = TRUE
),
closed = c( TRUE, FALSE ),
type = "Z"
)
rownames(interval1) <- c("A","B","C", "D")
interval2 <- Intervals(
matrix(
c(
0, 10,
12, 20,
22, 30,
30, 100
),
ncol = 2, byrow = TRUE
),
closed = c( TRUE, FALSE ),
type = "Z"
)
rownames(interval2) <- c("P","Q","R", "S")
Now I want to create the following output table
So where the A value overlap the two invervals, I want to 'copy' all the data to a line below.
We also introduce data$X
which is the interval1
value and data$y
which is the interval2
value.
Where data does not fit within any of the interval, I want to remove it from the data.frame
I am not sure if the break()
function would be better used to create the intervals or if the dplyr
function can be used to make the reoccuring data rows
You can use foverlaps
in data.table
:
library(data.table)
C.DT <- data.table(C)
C.DT[, A1:=A] # required for `foverlaps` so we can do a range search
# `D` and `E` are your interval matrices
I1 <- data.table(cbind(data.frame(D), idX=LETTERS[1:4], idY=NA))
I2 <- data.table(cbind(data.frame(E), idX=NA, idY=LETTERS[16:19]))
setkey(I1, X1, X2) # set the keys on our interval ranges
setkey(I2, X1, X2)
rbind(
foverlaps(C.DT, I1, by.x=c("A", "A1"), nomatch=0), # match every value in `C.DT$A` to the ranges in `I1`
foverlaps(C.DT, I2, by.x=c("A", "A1"), nomatch=0)
)[order(A, B), .(A, B, X=idX, Y=idY)]
Produces:
A B X Y
1: 5 1 A NA
2: 5 1 NA P
3: 6 2 A NA
4: 6 2 NA P
5: 7 2 A NA
6: 7 2 NA P
7: 10 1 A NA
8: 10 1 NA P
9: 11 3 A NA
10: 23 2 B NA
11: 23 2 NA R
12: 24 1 B NA
13: 24 1 NA R
14: 30 3 C NA
15: 30 3 NA R
16: 30 3 NA S
Note you can easily change what you get instead of NA, by modifying the steps where I1
and I2
are created.
这篇关于使用间隔来分配分类值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!