使用data.table包或其他解决方案的子集和重组数据帧[R] [英] Subset and recombine dataframes using data.table package or other solutions [R]
问题描述
我对R很新,有一个问题关于子集和重组在两个数据框之间使用范围值的变量之一。所以我有我的两个数据框像这样:
xy
[1,] 79.00 19.63
[2 ,] 79.01 19.58
[3,] 79.02 19.57
[4,] 79.03 19.58
[5,] 79.04 19.60
[6,] 79.05 19.65
[7 ,] 79.06 19.67
[8,] 79.07 19.70
[9,] 79.08 19.67
[10,] 79.09 19.72
和
id min_x max_x
[1,] 7G005 -1010-10 79.01 79.06
[2,] 7G100-0001-10 79.02 79.09
[3,] 8S010-1201-10 79.06 79.09
我的目的是结合他们两个像这样:
id xy
7G005-1010-10 79,01 19,58
7G005-1010-10 79,02 19,57
7G005-1010-10 79,03 19,58
7G005-1010-10 79,04 19,6
7G005-1010-10 79,05 19,65
7G005-1010-10 79,06 19,7
7G100-0001 -10 79,02 19,57
... ... ...
正如你可以看到在我的dataframes的输出,我尝试使用 data.table
包找到一种方法来解决我的问题。
好吧,如果有人能告诉我如何处理它(有或没有 data.table
)!
提前感谢。
对不好的英语很抱歉。
这在 data.table
中是不可能的。这是 FR# 203 来实现。你可以尝试包 xts
,因为我认为有这个操作。
一个长而笨的方式 data.table
如下。假设您的第一个表是 P
,而包含范围的第二个表是 R
。
setkey(P,x)
pre>
#按x排序并标记为已排序,以便未来的查询可以使用二进制搜索P
from = P [J(R $ min_x),which = TRUE]
#在P键中查找每个min_x,返回位置。 J代表Join。
to = P [J(R $ max_x),which = TRUE]
#查找P键中的每个max_x,返回位置。
len = to-from + 1
#为每个项目矢量化长度到[i] -from [i] +1
i = unlist(mapply seq.int,from,to,SIMPLIFY = FALSE))
#对于从[i]:到[i]的序列的每个项目,然后将它们连接成一个向量
cbind(rep(R $ id,len),P [i])
#使用len扩展R的项目以匹配它们在P
中匹配的项目I am quite new to R and have a question about subset and recombine between two dataframe using range value of one of the variable. So i have my two dataframes like this :
x y [1,] 79.00 19.63 [2,] 79.01 19.58 [3,] 79.02 19.57 [4,] 79.03 19.58 [5,] 79.04 19.60 [6,] 79.05 19.65 [7,] 79.06 19.67 [8,] 79.07 19.70 [9,] 79.08 19.67 [10,] 79.09 19.72
and
id min_x max_x [1,] 7G005-1010-10 79.01 79.06 [2,] 7G100-0001-10 79.02 79.09 [3,] 8S010-1201-10 79.06 79.09
My purpose is to combine the two of them like this:
id x y 7G005-1010-10 79,01 19,58 7G005-1010-10 79,02 19,57 7G005-1010-10 79,03 19,58 7G005-1010-10 79,04 19,6 7G005-1010-10 79,05 19,65 7G005-1010-10 79,06 19,7 7G100-0001-10 79,02 19,57 ... ... ...
As you can see on the output of my dataframes, i try to use the
data.table
package to find a way to solve my probleme.Well, if anybody can tell me how deal with it (with or without
data.table
)!Thank you in advance.
Sorry for the poor English.
解决方案This isn't possible in
data.table
nicely. It's FR#203 to implement. You could try packagexts
as I think that has this operation.One long and clunky way (untested) in
data.table
is as follows. Say your first table isP
and the 2nd table containing the ranges isR
.setkey(P,x) # sort by x and mark as sorted so future queries can use binary search on P from = P[J(R$min_x),which=TRUE] # Lookup each min_x in the key of P, returning the location. J stands for Join. to = P[J(R$max_x),which=TRUE] # Lookup each max_x in the key of P, returning the location. len = to-from+1 # vectorized for each item the length to[i]-from[i]+1 i = unlist(mapply("seq.int",from,to,SIMPLIFY=FALSE)) # for each item the sequence from[i]:to[i], then concat them all into one vector cbind(rep(R$id,len), P[i]) # use len to expand the items of R to match what they match to in P
这篇关于使用data.table包或其他解决方案的子集和重组数据帧[R]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!