汇总数据并保留日期列值 [英] Summarize data and keep date column value
问题描述
我之前曾问过类似的问题,并且得到了很好的答案,但需要更多有关汇总和日期的指导。 使用dplyr汇总并计数R中的数据
I asked a similar question before and got an excellent answer, but needed some more guidance on the topic of summarizing and dates. Summarize and count data in R with dplyr
目标:
在我的新数据集中,我有列日期,事件发生的时间。当我想按照另一篇文章中的建议继续进行示例操作时,我收到一条错误消息:
In my new dataset i have column with dates, when the event occured. When i want to proceed in the example as suggested in the other post, I get an error message:
数据集:
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
Date = c("25.11.2015 13:59", "03.12.2015 09:32", "07.12.2015 08:18", "08.12.2015 19:40", "08.12.2015 19:40",
"22.12.2015 08:50", "22.12.2015 08:52", "05.01.2016 13:22",
"06.01.2016 09:18", "14.02.2016 22:47",
"20.02.2016 21:27", "01.04.2016 13:52", "24.07.2016 07:03"),
StimuliA = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L), StimuliB = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L), R2 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 0L), R3 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), R5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), R6 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L), R7 = c(0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L), stims = c("0_0", "0_0", "1_0", "1_0", "1_1",
"1_1", "1_1", "1_1", "1_1", "1_2", "1_2", "1_2", "2_2")), .Names = c("User", "Date", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", "R7", "stims"), row.names = c(NA, -13L), spec = structure(list(
cols = structure(list(User = structure(list(), class = c("collector_integer",
"collector")), Date = structure(list(), class = c("collector_character",
"collector")), StimuliA = structure(list(), class = c("collector_integer",
"collector")), StimuliB = structure(list(), class = c("collector_integer",
"collector")), R2 = structure(list(), class = c("collector_integer",
"collector")), R3 = structure(list(), class = c("collector_integer",
"collector")), R4 = structure(list(), class = c("collector_integer",
"collector")), R5 = structure(list(), class = c("collector_integer",
"collector")), R6 = structure(list(), class = c("collector_integer",
"collector")), R7 = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB",
"R2", "R3", "R4", "R5", "R6", "R7")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"), class = c("tbl_df", "tbl", "data.frame"))
代码:
df$stims <- with(df, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))
aggregate(. ~ User + stims, data=df, sum)
Error in Summary.factor(c(12L, 2L), na.rm = FALSE) :
‘sum’ not meaningful for factors
问题/所需结果:
我想保留刺激发生的日期(或刺激A和B为0,然后是特定用户的第一个日期)
Question/Desired result: In my result, I would like to keep the date of when the Stimuli occured (or when stimuli A and B are 0, then of the first date of the specific user)
User Date StimuliA StimuliB R2 R3 R4 R5 R6 R7
1 25.11.2015 13:59 0 0 1 0 0 0 0 1
1 07.12.2015 08:18 1 0 0 0 0 0 1 0
1 08.12.2015 19:40 0 1 0 2 0 0 1 1
2 05.01.2016 13:22 0 0 0 0 0 0 1 0
2 14.02.2016 22:47 0 1 2 0 0 0 0 0
2 24.07.2016 07:03 1 0 0 0 0 0 0 0
在此结果表中,当刺激A和B为st时,我们得到值的总和(R2-R7)生病0。[Line1]然后,对于每个刺激,直到下一个刺激发生之前,都会记录R2-R7的总和。
In this result table, we have the sum of the values (R2-R7), when Stimuli A and B are still 0. [Line1] Then for each Stimuli, there is the sum of R2-R7 noted until the next Stimuli occurs.
这在上一篇文章中已提出,但是我无法使其工作:
This was suggested in the previous post, but I am unable to make it work:
您不想使用日期作为因素。使用as.Date将日期转换为
Date变量(有关SO的许多文章)。然后,一种方法
将是用户和类似于上面的
刺激分别汇总日期变量,采用最小值而不是总和。然后合并
两个结果data.frames。如果这没有道理,则可能值得
提出一个链接到该问题的新问题,并在date变量中增加
个问题。还包括一个示例
数据集,该数据集包含此变量@lmo
You don't want to work with dates as factors. Transform the date to a Date variable using as.Date (many posts on this on SO). One method then would be to separately aggregate the date variable by User and stims similar to above, taking the min rather than the sum. Then merge the two resulting data.frames. If this does not make sense, it might be worth asking a new question that links to this question, adding the additional problem of the date variable. Also include an example dataset that includes this variable @lmo
推荐答案
一个想法通过 dplyr
可以过滤所有非刺激并获取每个用户的第一个观察值(通过 slice
)。过滤所有刺激和 bind_rows
,即
One idea via dplyr
would be to filter all non-stimuli and grab the first observation for each user (via slice
). The filter all the stimuli and bind_rows
, i.e.
library(dplyr)
bind_rows(
df %>%
filter(rowSums(.[3:4]) == 0) %>%
group_by(User) %>%
slice(1L),
df %>%
filter(rowSums(.[3:4]) != 0)) %>%
arrange(User)
哪个给出
# A tibble: 6 x 11
# Groups: User [2]
User Date StimuliA StimuliB R2 R3 R4 R5 R6 R7 stims
<int> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
1 1 25.11.2015 13:59 0 0 1 0 0 0 0 0 0_0
2 1 07.12.2015 08:18 1 0 0 0 0 0 0 0 1_0
3 1 08.12.2015 19:40 0 1 0 0 0 0 0 0 1_1
4 2 05.01.2016 13:22 0 0 0 0 0 0 1 0 1_1
5 2 14.02.2016 22:47 0 1 0 0 0 0 0 0 1_2
6 2 24.07.2016 07:03 1 0 0 0 0 0 0 0 2_2
这篇关于汇总数据并保留日期列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!