R：使用数据的组的第一次观察。自连接 [英] R: first observation by group using data.table & self-join

查看：94 发布时间：2017/3/12 11:29:37 r data.table self-join

本文介绍了R：使用数据的组的第一次观察。自连接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个工作的解决方案：

  col1 < -  c（1,1,1,1,2,2,2,2,3,3,3,3）
 col2 < -  c（2000,2000,2001,2001,2000,2000,2001,2001,2000,2000,2001,2001）
 col4 <-c（1,2,3,4 ，5,6,7,8,9,10,11,12）
 data<  -  data.frame（store = col1，year = col2，month = 12，sales = col4）
 
 solution1 < -  data.table（data）[，。SD [1，]，by =store，year，month]

我使用了Matthew Dowle在以下链接中建议的较慢的方法：

http：// stats。 stackexchange.com/questions/7884/fast-ways-in-r-to-get-the-first-row-of-a-data-frame-groupedby-an-identifier

我试图实现更快的自我加入，但不能让它工作。

有人有任何建议吗？ p>

解决方案

选项1（使用键）

将键设置为 store，year，month

  DT<  -  data.table ，key = c（'store'，'year'，'month'））

使用 unique 创建包含键列的唯一值的data.table。默认情况下，这将采取第一个条目

  unique（DT）
商店年月销售
 1： 1 2000 12 1 
 2：1 2001 12 3 
 3：2 2000 12 5 
 4：2 2001 12 7 
 5：3 2000 12 9 
 6： 3 2001 12 11

但是，一定要使用 mult ='first'。（其他选项为'all'或'last'）

 ＃键（DT）仅对键列进行子集，因此您最终不会有两个
＃销售列
 DT [unique ，key（DT），with = FALSE]），mult ='first']

没有键）

没有设置键，使用 .I 不能 .SD

  DTb < -  data.table（data）
 DTb [ DTb [，list（row1 = .I [1]），by = list（store，year，month）] [，row1]]

I'm trying to get the top row by a group of three variables using a data.table.

I have a working solution:

col1 <- c(1,1,1,1,2,2,2,2,3,3,3,3)
col2 <- c(2000,2000,2001,2001,2000,2000,2001,2001,2000,2000,2001,2001)
col4 <- c(1,2,3,4,5,6,7,8,9,10,11,12)
data <- data.frame(store=col1,year=col2,month=12,sales=col4)

solution1 <- data.table(data)[,.SD[1,],by="store,year,month"]

I used the slower approach suggested by Matthew Dowle in the following link:

http://stats.stackexchange.com/questions/7884/fast-ways-in-r-to-get-the-first-row-of-a-data-frame-grouped-by-an-identifier

I'm trying to implement the faster self join but cannot get it to work.

Does anyone have any suggestions?

解决方案

option 1 (using keys)

Set the key to be store, year, month

DT <- data.table(data, key = c('store','year','month'))

Then you can use unique to create a data.table containing the unique values of the key columns. By default this will take the first entry

unique(DT)
   store year month sales
1:     1 2000    12     1
2:     1 2001    12     3
3:     2 2000    12     5
4:     2 2001    12     7
5:     3 2000    12     9
6:     3 2001    12    11

But, to be sure, you could use a self-join with mult='first'. (other options are 'all' or 'last')

# the key(DT) subsets the key columns only, so you don't end up with two 
# sales columns
DT[unique(DT[,key(DT), with = FALSE]), mult = 'first']

Option 2 (No keys)

Without setting the key, it would be faster to use .I not .SD

DTb <- data.table(data)
DTb[DTb[,list(row1 = .I[1]), by = list(store, year, month)][,row1]]

这篇关于R：使用数据的组的第一次观察。自连接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：使用数据的组的第一次观察。自连接 [英] R: first observation by group using data.table & self-join

问题描述

选项1（使用键）

没有键）

option 1 (using keys)

Option 2 (No keys)

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：使用数据的组的第一次观察。自连接 [英] R: first observation by group using data.table &amp; self-join

问题描述

选项1（使用键）

没有键）

option 1 (using keys)

Option 2 (No keys)

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

R：使用数据的组的第一次观察。自连接 [英] R: first observation by group using data.table & self-join

登录关闭