R DataTable连接和约束行 [英] R DataTable Join and constrain rows

查看:192
本文介绍了R DataTable连接和约束行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在一个数据表中总结一组观察结果,并且可以使用一些语法帮助。



我认为这和连接一样简单,我试图确定具体的观察日期,即使在当天的多个测量或传感器,看到具体的值。





  • 观察日期具有不同的度量计数(每日行数)

  • 'M'easurement列表示特定值



我已经创建了2个样本数据集,我希望这些数据集能够明确目标。我还创建了一个excel电子表格的图像,希望显示数据之间的关系。

  library(data.table )
raw< - data.table(
Date = as.Date(c(2013-5-4,2013-5-4,2013-5-4 2013-5-9,2013-5-9,2013-5-16,2013-5-16,2013-5-16,2013-5-30)),
S1 = c(4,2,3,1,1,8,7,3,3),
S2 = c(2,5,2,4,4,9,1,6, 4),
S3 = c(6,2,2,7,3,2,7,2,1)


总结< - data.table
Date = as.Date(c(2013-5-4,2013-5-9,2013-5-16,2013-5-30)),
M1 = c(FALSE,TRUE,TRUE,TRUE),
M2 = c(TRUE,FALSE,TRUE,FALSE),
M3 = c b M4 = c(TRUE,FALSE,FALSE,TRUE),
M5 = c(TRUE,FALSE,FALSE,FALSE),
M6 = c(TRUE,FALSE,TRUE,FALSE) $ b M7 = c(FALSE,TRUE,TRUE,FALSE),
M8 = c(FALSE,FALSE,TRUE,FALSE),
M9 = c(FALSE,FALSE,TRUE,FALSE) b $ b M10 = c(FALSE,FALSE,TRUE,FALSE)

img src =https://i.stack.imgur.com/bydHL.jpgalt =Excel图像显示数据集之间的关系>
Excel



Raw是测量输入。多次测量可在同一观察日期(即多行)进行。



总结是我希望得到的结果。总结行,并且'm'测量列仅指示在任何V列中的当天观察到值(跟随M,即M1,M2)。例如,在5/16的第一次和最后一次观察中看到数字2,但是在5/16的9个值中没有出现数字5.



我想我需要使用一个联接,但是如何计算M列逃避我。



任何帮助是非常感激。



问题:在数据科学或数学中是否有这种类型的操作的名称?



更新
我正在尝试下列

  setkey(raw,Date)
s< - data.table unique(raw $ Date))#获取唯一日期的数据类型
setkey(s,Date)
s [raw,M1:=(length(na.omit请注意,如果您要查看原始档案,请按照下列步骤操作:



,这些值不是5-4的预期值(应为FALSE)。我认为这是因为原始行不在我的匹配语句中约束。

 日期M1 
1:2013 -05-04 TRUE
2:2013-05-09 TRUE
3:2013-05-16 TRUE
4:2013-05-30 TRUE

我的猜测是我需要使用不同的东西来连接原始行。

解决方案

这似乎有效:

  raw [,lapply(1:10,`%in%`,unique(unlist(.SD))),by = Date] 


结果是

 日期V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 
1:2013-05-04假真假真假真假假假假
2:2013-05-09真假真假真假假真假假假
3:2013- 05-16 TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
4:2013-05-30 TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE

如果您希望使用M而不是V的列,请使用 c(M = 1:10)代替 1:10


I'd like to summarize a set of observations in a datatable and could use some help with the syntax.

I think this is as simple as a join but I'm trying to identify that specific values were seen on a specific observation DAY even if its across multiple measurements or sensors on that day.

  • observations are summarized by date
  • observations date have varied counts of measurements (rows per date)
  • 'M'easurement columns indicate that a specific value was observed in ANY sensor for the day.

I've created 2 sample sets of data that I hope will clarify the goal. I've also created an image of an excel spreadsheet that hopes to show the relationship between the data.

library(data.table)
raw <- data.table(
  Date = as.Date(c("2013-5-4","2013-5-4","2013-5-4", "2013-5-9","2013-5-9", "2013-5-16","2013-5-16","2013-5-16", "2013-5-30")),
  S1 = c(4, 2, 3, 1, 1, 8, 7, 3, 3),
  S2 = c(2, 5, 2, 4, 4, 9, 1, 6, 4),
  S3 = c(6, 2, 2, 7, 3, 2, 7, 2, 1)
)

summarized <- data.table(
  Date = as.Date(c("2013-5-4", "2013-5-9", "2013-5-16", "2013-5-30")),
  M1 = c(FALSE,TRUE,TRUE,TRUE),
  M2 = c(TRUE,FALSE,TRUE,FALSE),
  M3 = c(TRUE,TRUE,TRUE,TRUE),
  M4 = c(TRUE,FALSE,FALSE,TRUE),
  M5 = c(TRUE,FALSE,FALSE,FALSE),
  M6 = c(TRUE,FALSE,TRUE,FALSE),
  M7 = c(FALSE,TRUE,TRUE,FALSE),
  M8 = c(FALSE,FALSE,TRUE,FALSE),
  M9 = c(FALSE,FALSE,TRUE,FALSE),
  M10 = c(FALSE,FALSE,TRUE,FALSE)
)

Excel

Raw is the measurements input. Multiple measurements can happen on the same observation date (i.e. multiple rows).

Summarized is what I'm hoping to get out. Rows are summarized and the 'm'easurement columns merely indicate that the value (following the M, i.e. M1, M2) was observed on the day in any of the V columns. For example, the number 2 was seen on the first and last observation on 5/16, but the number 5 was not seen in any of the 9 values on 5/16.

I think I need to use a join but how to calculate the M columns escapes me.

Any help is much appreciated.

Question: is there a name for this type of operation in data science or mathematics?

Update: I'm trying the following

setkey(raw,Date)
s <- data.table( Date=unique(raw$Date)) # get a datatable of the unique dates
setkey(s,Date)
s[raw, M1:=(length(na.omit(match(c(raw$V1,raw$v2,raw$v3),1)))>=1)]

Note that the values are not what's expected for 5-4 (should be FALSE). I think this is becuase the raw rows are not being constrained in my match statement.

         Date   M1
1: 2013-05-04 TRUE
2: 2013-05-09 TRUE
3: 2013-05-16 TRUE
4: 2013-05-30 TRUE

My guess is I need to use something different to subset the raw rows in the join.

解决方案

This seems to work:

raw[,lapply(1:10,`%in%`,unique(unlist(.SD))),by=Date]

The result is

         Date    V1    V2   V3    V4    V5    V6    V7    V8    V9   V10
1: 2013-05-04 FALSE  TRUE TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
2: 2013-05-09  TRUE FALSE TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
3: 2013-05-16  TRUE  TRUE TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
4: 2013-05-30  TRUE FALSE TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

If you want the columns with "M" instead of "V", use c(M=1:10) in place of 1:10.

这篇关于R DataTable连接和约束行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆