使用矩阵的条件下的Data.table自连接 [英] Data.table self-join on condition using a matrix

查看:84
本文介绍了使用矩阵的条件下的Data.table自连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图做一个data.table与自己的联接。
要连接的条件基于用于访问矩阵的列(而不是键)的值。
每行都有一个日期(以秒为单位),记录只能加入最新的记录(t1

  cn <-unique列)
mat< -matrix(data = 0,nrow = lde,ncol = lde,dimnames = list(cn,cn))

我遇到的文档常见问题(包括SQL到data.table类比)和初学者指南data.table和多个类似的问题在这个论坛,但我找不到以解决它。



Q1 Q2 Q3



我遇到记法问题。



我的DT是sdd:

 > colnames(sdd)
[1]IDDelayStartTimeSecondsDelayEndTimeSecondsEquipmentID

我已经做了几次尝试,例如:

  sdd2< -sdd#避免与变量名称有关的问题
sdd [sdd2,eqXrossM [cbind(sdd.EquipmentID,sdd2.EquipmentID)] == 1& sdd.DelayEndTimeSeconds $ b sdd [sdd2 [sdd.DelayEndTimeSeconds< sdd2.DelayStartTimeSeconds,distance:= sdd.DelayEndTimeSeconds< sdd2.DelayStartTimeSeconds]] #this是近似尝试。

我根本没有得到符号,不同的例子似乎使用不同的符号。



编辑:经过一夜的睡眠,一些东西是有意义的...但其他人仍然困惑不解。例如:

  sdd_x <-sdd [sdd2,i.DelayStartTimeSeconds> DelayEndTimeSeconds]#返回向量,只要sdd:len 
sdd_x< -sdd [sdd2,i.DelayStartTimeSeconds> DelayEndTimeSeconds& eqXrossM [i.EquipmentID,EquipmentID] == 1]#返回矩阵len x len。

为什么要添加新条件会更改输出类型?我期待一个案例如矩阵(这将需要优化)
此外,整个矩阵是假的不是期望值,因为记录是不同的。事实上对于第二种情况,上对角线或下对角线应该为TRUE。



此外,看起来像矩阵的调用不需要使用cbind作为其他答案到一个类似的问题。为什么?



我最后的发现已经找到了CJ()运算符,但试图使用i。符号在这里不工作。这部分似乎没有记录。

  sdd [CJ(DASDelayID,DASDelayID),i.DelayStartTimeSeconds> DelayEndTimeSeconds] 



任何帮助将不胜感激。

解决方案

是我终于解决了这个问题:

  sddx <-CJ(ID1 = sdd $ DASDelayID,ID2 = sdd $ DASDelayID )[
ID1 ':='(Connected = eqXrossM [cbind(sdd [DASDelayID == ID2,EquipmentID],sdd [DASDelayID == ID1,EquipmentID])] == 1,
Distance = as.integer(sdd [DASDelayID == ID2,DelayStartTimeSeconds] -sdd [DASDelayID == ID1,DelayEndTimeSeconds]))
]
/ pre>

一步一步:



生成DelayID的所有组合,数字大,

  sddx <-CJ(ID1 = sdd $ DASDelayID,ID2 = sdd $ DASDelayID)

这将大小减少一半,因为ID1在创建时给出,按DelayStartTime和DelayEndTime> DelayStartTime。

  [ID1   

这强制外部条件访问矩阵,注意cbind:

 '(Connected = eqXrossM [cbind(sdd [DASDelayID == ID2,EquipmentID],sdd [DASDelayID == ID1,EquipmentID])] == 1,

这会计算延迟之间的距离,可用于过滤不严格正的值。

  Distance = as.integer(sdd [DASDelayID == ID2,DelayStartTimeSeconds] -sdd [DASDelayID == ID1,DelayEndTimeSeconds]))
]

我希望它能帮助别人。


I am trying to do a join of a data.table with itself. The condition to join is based on the value of a column (not the key) being used to access a matrix. Each row has a date(in seconds) and records should only join with newest records (t1

cn<-unique(sdd$column)
mat<-matrix(data=0,nrow=lde,ncol=lde,dimnames=list(cn,cn))

I am struggling with the documentation FAQ (including the SQL to data.table analogy) and the Beginner's Guide for data.table and multiple similar questions in this forum but I can't find how to solve it.

Q1, Q2, Q3

I get stuck with notation problems.

My DT is sdd:

> colnames(sdd)
[1] "ID" "DelayStartTimeSeconds" "DelayEndTimeSeconds" "EquipmentID"

I have made several attempts such as:

sdd2<-sdd # to avoid problems with the names of variables
sdd[sdd2,eqXrossM[cbind(sdd.EquipmentID,sdd2.EquipmentID)]==1 & sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds, distance:=sdd.DelayEndTimeSeconds-sdd2.DelayStartTimeSeconds][,distance:=sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds] # that would be the whole thing to do generating a new column with the time difference

sdd[sdd2[ sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds, distance:=sdd.DelayEndTimeSeconds<sdd2.DelayStartTimeSeconds]] #this is an approximation attempt.

I simply don't get the notation and the different examples seem to use different notations.

EDIT: Well, after a night of sleep some stuff is making sense... but other is still confusing just the same. For example:

sdd_x<-sdd[sdd2,i.DelayStartTimeSeconds>DelayEndTimeSeconds] # returns a vector as long as sdd: len
sdd_x<-sdd[sdd2,i.DelayStartTimeSeconds>DelayEndTimeSeconds & eqXrossM[i.EquipmentID,EquipmentID]==1] # returns a matrix len x len.

Why adding a new condition changes the type of output? I was expecting a case such as the matrix (that would require optimization) In addition the whole matrix is false what is not the expected value as the records are different. In fact for the second case, either the upper or lower diagonal should be TRUE.

Also, looks like the call to the matrix doesn't require using cbind as other answer to a similar question mentioned. Why that?

And my last discovery has been finding out the CJ() operator but trying to use the i. notation doesn't work here. This part doesn't seem too documented.

sdd[CJ(DASDelayID,DASDelayID),i.DelayStartTimeSeconds>DelayEndTimeSeconds]

Any help would be appreciated.

解决方案

This is how I finally solved the problem:

sddx<<-CJ(ID1=sdd$DASDelayID,ID2=sdd$DASDelayID)[
    ID1<ID2] [,
              ':='(Connected=eqXrossM[cbind(sdd[DASDelayID==ID2,EquipmentID],sdd[DASDelayID==ID1,EquipmentID])]==1,
                   Distance=as.integer(sdd[DASDelayID==ID2,DelayStartTimeSeconds]-sdd[DASDelayID==ID1,DelayEndTimeSeconds]))
              ]

Step by step:

Generate all the combinations of DelayID, the number is large but each row has only two columns integers.

sddx<<-CJ(ID1=sdd$DASDelayID,ID2=sdd$DASDelayID) 

This cuts the size to half, since ID1 are given as they are created, ordered by DelayStartTime and DelayEndTime>DelayStartTime.

[ID1<ID2] 

This enforces the external condition accessing the matrix, note the cbind:

[,':='(Connected=eqXrossM[cbind(sdd[DASDelayID==ID2,EquipmentID],sdd[DASDelayID==ID1,EquipmentID])]==1,

This calculates the distance between Delays, that can be used to filter the ones where it is not strictly positive

Distance=as.integer(sdd[DASDelayID==ID2,DelayStartTimeSeconds]-sdd[DASDelayID==ID1,DelayEndTimeSeconds]))  
              ]

I hope it helps someone else.

这篇关于使用矩阵的条件下的Data.table自连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆