使用data.table包或其他解决方案的子集和重组数据帧[R] [英] Subset and recombine dataframes using data.table package or other solutions [R]

查看:96
本文介绍了使用data.table包或其他解决方案的子集和重组数据帧[R]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R很新,有一个问题关于子集和重组在两个数据框之间使用范围值的变量之一。所以我有我的两个数据框像这样:

  xy 
[1,] 79.00 19.63
[2 ,] 79.01 19.58
[3,] 79.02 19.57
[4,] 79.03 19.58
[5,] 79.04 19.60
[6,] 79.05 19.65
[7 ,] 79.06 19.67
[8,] 79.07 19.70
[9,] 79.08 19.67
[10,] 79.09 19.72

  id min_x max_x 
[1,] 7G005 -1010-10 79.01 79.06
[2,] 7G100-0001-10 79.02 79.09
[3,] 8S010-1201-10 79.06 79.09

我的目的是结合他们两个像这样:

  id xy 
7G005-1010-10 79,01 19,58
7G005-1010-10 79,02 19,57
7G005-1010-10 79,03 19,58
7G005-1010-10 79,04 19,6
7G005-1010-10 79,05 19,65
7G005-1010-10 79,06 19,7
7G100-0001 -10 79,02 19,57
... ... ...

正如你可以看到在我的dataframes的输出,我尝试使用 data.table 包找到一种方法来解决我的问题。



好吧,如果有人能告诉我如何处理它(有或没有 data.table )!



提前感谢。



对不好的英语很抱歉。

解决方案

这在 data.table 中是不可能的。这是 FR# 203 来实现。你可以尝试包 xts ,因为我认为有这个操作。



一个长而笨的方式 data.table 如下。假设您的第一个表是 P ,而包含范围的第二个表是 R

  setkey(P,x)
#按x排序并标记为已排序,以便未来的查询可以使用二进制搜索P

from = P [J(R $ min_x),which = TRUE]
#在P键中查找每个min_x,返回位置。 J代表Join。

to = P [J(R $ max_x),which = TRUE]
#查找P键中的每个max_x,返回位置。

len = to-from + 1
#为每个项目矢量化长度到[i] -from [i] +1

i = unlist(mapply seq.int,from,to,SIMPLIFY = FALSE))
#对于从[i]:到[i]的序列的每个项目,然后将它们连接成一个向量

cbind(rep(R $ id,len),P [i])
#使用len扩展R的项目以匹配它们在P
中匹配的项目
pre>

I am quite new to R and have a question about subset and recombine between two dataframe using range value of one of the variable. So i have my two dataframes like this :

        x         y                         
 [1,] 79.00     19.63
 [2,] 79.01     19.58
 [3,] 79.02     19.57
 [4,] 79.03     19.58
 [5,] 79.04     19.60
 [6,] 79.05     19.65
 [7,] 79.06     19.67
 [8,] 79.07     19.70
 [9,] 79.08     19.67
[10,] 79.09     19.72

and

          id        min_x  max_x
[1,] 7G005-1010-10  79.01  79.06  
[2,] 7G100-0001-10  79.02  79.09
[3,] 8S010-1201-10  79.06  79.09

My purpose is to combine the two of them like this:

     id           x       y
7G005-1010-10   79,01   19,58
7G005-1010-10   79,02   19,57
7G005-1010-10   79,03   19,58
7G005-1010-10   79,04   19,6
7G005-1010-10   79,05   19,65
7G005-1010-10   79,06   19,7
7G100-0001-10   79,02   19,57
     ...         ...     ...

As you can see on the output of my dataframes, i try to use the data.table package to find a way to solve my probleme.

Well, if anybody can tell me how deal with it (with or without data.table)!

Thank you in advance.

Sorry for the poor English.

解决方案

This isn't possible in data.table nicely. It's FR#203 to implement. You could try package xts as I think that has this operation.

One long and clunky way (untested) in data.table is as follows. Say your first table is P and the 2nd table containing the ranges is R.

setkey(P,x)
# sort by x and mark as sorted so future queries can use binary search on P

from = P[J(R$min_x),which=TRUE]
# Lookup each min_x in the key of P, returning the location. J stands for Join.

to = P[J(R$max_x),which=TRUE]
# Lookup each max_x in the key of P, returning the location.

len = to-from+1
# vectorized for each item the length to[i]-from[i]+1

i = unlist(mapply("seq.int",from,to,SIMPLIFY=FALSE))
# for each item the sequence from[i]:to[i], then concat them all into one vector

cbind(rep(R$id,len), P[i])
# use len to expand the items of R to match what they match to in P

这篇关于使用data.table包或其他解决方案的子集和重组数据帧[R]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆