按时间戳分组数据,然后使用R对活动类型分组 [英] Grouping data by time stamp and then activity type using R

查看:551
本文介绍了按时间戳分组数据,然后使用R对活动类型分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,这里是问题。



我有一个数据集列出了在不同时间与各种ID相关的活动(各种类型)。数据集实际上是几万行长,看起来像这样

  ID DATE_EVENT TIME_EVENT EVENT_TYPE 
1: 520424473 07/08/2014 09:28:16 9,210
2:504344215 07/08/2014 09:10:27 1,000
3:051745297 07/08/2014 09:40:16 1,000
4:961837100 07/08/2014 09:44:13 1,000
5:412980113 07/08/2014 09:40:59 1,000
6:051745297 07/08/2014 09:40: 23 9,034
7:520424473 07/08/2014 09:28:22 1,000

我想能够做的是按ID分组,然后按时间顺序排序,然后统计每个EVENT_TYPE在整个数据集中花费的时间(甚至在EVENT_TYPES范围内更好)。我之前使用过这个

  library(data.table)
setDT(Allvol)[,list (volume,na.rm = T),
sd = sd(volume,na.rm = T)),by = ID]

之前,为了按ID分组数据,然后计算出每个的平均值和sd,但是数据集略有不同,我有一个卷相关的列EVENT_TYPES。我想我需要类似的东西,但不知道如何处理这个。



任何帮助是非常赞赏的。

解决方案

您尚未提供数据,但以下内容可能会有所帮助:

  volume = sample (1000:2000100)
id = sample(1:10,100,replace = T)
allvol = data.frame(id,volume)

head b $ b id volume
1 5 1946
2 6 1828
3 5 1851
4 6 1296
5 5 1285
6 8 1238

表示= with(allvol,tapply(volume,id,mean))
sds = with(allvol,tapply(volume,id,sd))

outdf = data .frame(id = names(means),means,sds)

outdf
id表示sds
1 1 1566.000 397.5433
2 2 1504.818 368.3938
3 3 1660.600 328.4202
4 4 1518.308 265.1347
5 5 1482.000 309.9055
6 6 1342.800 281.8632
7 7 1555.444 232.2246
8 8 1556.667 286.3241
9 9 1588.500 283.5166
10 10 1505.867 348.3440


Ok, so here is the problem.

I have a dataset that lists the activity (of various types) associated with various ID's at various times. The dataset is actually a few tens of thousands of rows long and looks like this

      ID        DATE_EVENT TIME_EVENT EVENT_TYPE
 1:   520424473 07/08/2014   09:28:16      9,210      
 2:   504344215 07/08/2014   09:10:27      1,000    
 3:   051745297 07/08/2014   09:40:16      1,000    
 4:   961837100 07/08/2014   09:44:13      1,000     
 5:   412980113 07/08/2014   09:40:59      1,000
 6:   051745297 07/08/2014   09:40:23      9,034
 7:   520424473 07/08/2014   09:28:22      1,000

What I would like to be able to do is to to group up things by ID, then order them chronologically and then do statistics on how long was spent in each EVENT_TYPE across the whole data set, (or even better in a range of EVENT_TYPES). I have used this before

library(data.table)
setDT(Allvol)[, list(mean = mean(volume, na.rm = T), 
                     sd = sd(volume, na.rm = T)), by = ID]

on some data previously in order to group data by the ID and then work out the mean and s.d for each one, however that dataset was slightly different and I had a column for volumes associated with EVENT_TYPES. I think I need something similar but am not sure how to approach this.

Any help is much appreciated!

解决方案

You have not provided with data but following may be helpful:

volume = sample(1000:2000,100)
id = sample(1:10,100, replace=T)
allvol = data.frame(id, volume)

head(allvol)
  id volume
1  5   1946
2  6   1828
3  5   1851
4  6   1296
5  5   1285
6  8   1238

means = with(allvol, tapply(volume, id, mean))
sds = with(allvol, tapply(volume, id, sd))

outdf = data.frame(id=names(means), means, sds)

outdf
   id    means      sds
1   1 1566.000 397.5433
2   2 1504.818 368.3938
3   3 1660.600 328.4202
4   4 1518.308 265.1347
5   5 1482.000 309.9055
6   6 1342.800 281.8632
7   7 1555.444 232.2246
8   8 1556.667 286.3241
9   9 1588.500 283.5166
10 10 1505.867 348.3440

这篇关于按时间戳分组数据,然后使用R对活动类型分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆