汇总数据并保留日期列值 [英] Summarize data and keep date column value

查看:80
本文介绍了汇总数据并保留日期列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前曾问过类似的问题,并且得到了很好的答案,但需要更多有关汇总和日期的指导。 使用dplyr汇总并计数R中的数据

I asked a similar question before and got an excellent answer, but needed some more guidance on the topic of summarizing and dates. Summarize and count data in R with dplyr

目标:

在我的新数据集中,我有列日期,事件发生的时间。当我想按照另一篇文章中的建议继续进行示例操作时,我收到一条错误消息:

In my new dataset i have column with dates, when the event occured. When i want to proceed in the example as suggested in the other post, I get an error message:

数据集:

structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,  2L, 2L, 2L),
Date = c("25.11.2015 13:59", "03.12.2015 09:32",  "07.12.2015 08:18", "08.12.2015 19:40", "08.12.2015 19:40",
"22.12.2015 08:50",  "22.12.2015 08:52", "05.01.2016 13:22", 
"06.01.2016 09:18", "14.02.2016 22:47",  
"20.02.2016 21:27", "01.04.2016 13:52", "24.07.2016 07:03"), 
    StimuliA = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L), StimuliB = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L), R2 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 1L, 0L), R3 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), R5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), R6 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 
    0L, 0L, 0L, 0L), R7 = c(0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
    0L, 0L, 0L, 0L), stims = c("0_0", "0_0", "1_0", "1_0", "1_1", 
    "1_1", "1_1", "1_1", "1_1", "1_2", "1_2", "1_2", "2_2")), .Names = c("User",  "Date", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6",  "R7", "stims"), row.names = c(NA, -13L), spec = structure(list(
    cols = structure(list(User = structure(list(), class = c("collector_integer", 
    "collector")), Date = structure(list(), class = c("collector_character", 
    "collector")), StimuliA = structure(list(), class = c("collector_integer", 
    "collector")), StimuliB = structure(list(), class = c("collector_integer", 
    "collector")), R2 = structure(list(), class = c("collector_integer", 
    "collector")), R3 = structure(list(), class = c("collector_integer", 
    "collector")), R4 = structure(list(), class = c("collector_integer", 
    "collector")), R5 = structure(list(), class = c("collector_integer", 
    "collector")), R6 = structure(list(), class = c("collector_integer", 
    "collector")), R7 = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB", 
    "R2", "R3", "R4", "R5", "R6", "R7")), default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"), class = c("tbl_df",  "tbl", "data.frame"))

代码:

df$stims <- with(df, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))    
aggregate(. ~ User + stims, data=df, sum)
Error in Summary.factor(c(12L, 2L), na.rm = FALSE) : 
‘sum’ not meaningful for factors

问题/所需结果:
我想保留刺激发生的日期(或刺激A和B为0,然后是特定用户的第一个日期)

Question/Desired result: In my result, I would like to keep the date of when the Stimuli occured (or when stimuli A and B are 0, then of the first date of the specific user)

User    Date         StimuliA   StimuliB    R2  R3  R4  R5  R6  R7
 1  25.11.2015 13:59     0         0        1   0   0   0   0   1
 1  07.12.2015 08:18     1         0        0   0   0   0   1   0
 1  08.12.2015 19:40     0         1        0   2   0   0   1   1
 2  05.01.2016 13:22     0         0        0   0   0   0   1   0 
 2  14.02.2016 22:47     0         1        2   0   0   0   0   0
 2  24.07.2016 07:03     1         0        0   0   0   0   0   0

在此结果表中,当刺激A和B为st时,我们得到值的总和(R2-R7)生病0。[Line1]然后,对于每个刺激,直到下一个刺激发生之前,都会记录R2-R7的总和。

In this result table, we have the sum of the values (R2-R7), when Stimuli A and B are still 0. [Line1] Then for each Stimuli, there is the sum of R2-R7 noted until the next Stimuli occurs.

这在上一篇文章中已提出,但是我无法使其工作:

This was suggested in the previous post, but I am unable to make it work:


您不想使用日期作为因素。使用as.Date将日期转换为
Date变量(有关SO的许多文章)。然后,一种方法
将是用户和类似于上面的
刺激分别汇总日期变量,采用最小值而不是总和。然后合并
两个结果data.frames。如果这没有道理,则可能值得
提出一个链接到该问题的新问题,并在date变量中增加
个问题。还包括一个示例
数据集,该数据集包含此变量@lmo

You don't want to work with dates as factors. Transform the date to a Date variable using as.Date (many posts on this on SO). One method then would be to separately aggregate the date variable by User and stims similar to above, taking the min rather than the sum. Then merge the two resulting data.frames. If this does not make sense, it might be worth asking a new question that links to this question, adding the additional problem of the date variable. Also include an example dataset that includes this variable @lmo


推荐答案

一个想法通过 dplyr 可以过滤所有非刺激并获取每个用户的第一个观察值(通过 slice )。过滤所有刺激和 bind_rows ,即

One idea via dplyr would be to filter all non-stimuli and grab the first observation for each user (via slice). The filter all the stimuli and bind_rows, i.e.

library(dplyr)

bind_rows(
  df %>% 
    filter(rowSums(.[3:4]) == 0) %>% 
    group_by(User) %>%
    slice(1L), 
  df %>% 
    filter(rowSums(.[3:4]) != 0)) %>% 
  arrange(User)

哪个给出


# A tibble: 6 x 11
# Groups:   User [2]
   User             Date StimuliA StimuliB    R2    R3    R4    R5    R6    R7 stims
  <int>            <chr>    <int>    <int> <int> <int> <int> <int> <int> <int> <chr>
1     1 25.11.2015 13:59        0        0     1     0     0     0     0     0   0_0
2     1 07.12.2015 08:18        1        0     0     0     0     0     0     0   1_0
3     1 08.12.2015 19:40        0        1     0     0     0     0     0     0   1_1
4     2 05.01.2016 13:22        0        0     0     0     0     0     1     0   1_1
5     2 14.02.2016 22:47        0        1     0     0     0     0     0     0   1_2
6     2 24.07.2016 07:03        1        0     0     0     0     0     0     0   2_2


这篇关于汇总数据并保留日期列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆