用dplyr汇总和计数R中的数据 [英] Summarize and count data in R with dplyr
问题描述
目标:用dplyr汇总/计数已发生刺激的同一行中的响应。
Goal: Summarize/count responses in the same row of an occured stimuli with dplyr.
背景:在另一个主题上,我获得了一些出色的帮助:遍历R中的数据框并测量两个值之间的时间差
Background: I got some excellent help in another topic: Loop through dataframe in R and measure time difference between two values
现在,我正在使用相同/相似的数据集,我的目标是计算与刺激发生在同一行的用户对感知刺激的响应。数据集如下所示:
Now, I am working with the same/ similar dataset and my goal is to count the responses on perceived stimuli of users in the same row as where the stimuli occured. The dataset looks like this:
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), StimuliA = c(1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), StimuliB = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), R2 = c(0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L
), R3 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R5 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), R6 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), R7 = c(0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("User",
"StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", "R7"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
cols = structure(list(User = structure(list(), class = c("collector_integer",
"collector")), StimuliA = structure(list(), class = c("collector_integer",
"collector")), StimuliB = structure(list(), class = c("collector_integer",
"collector")), R2 = structure(list(), class = c("collector_integer",
"collector")), R3 = structure(list(), class = c("collector_integer",
"collector")), R4 = structure(list(), class = c("collector_integer",
"collector")), R5 = structure(list(), class = c("collector_integer",
"collector")), R6 = structure(list(), class = c("collector_integer",
"collector")), R7 = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("User", "StimuliA", "StimuliB",
"R2", "R3", "R4", "R5", "R6", "R7")), default = structure(list(),
class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
所需的输出:所需的输出将被汇总,所有响应汇总在出现的刺激的同一行:
Desired output: The desired output would be summarized list with all responses aggregate in the same row of the occured stimuli:
U StimuliA StimuliB R2 R3 R4 R5 R6 R7
1 1 0 0 0 0 0 0 1
1 1 0 1 1 0 0 1 0
1 0 1 1 2 0 0 1 0
1 0 1 0 0 0 0 0 0
2 1 0 3 0 0 0 0 0
2 0 1 1 0 0 0 2 0
在示例中,第1行记录A的刺激,而第2行1 1记录R7的刺激。然后,期望结果的结果是一行,在StimuliA处为1,在R7处为1。然后它再次开始,因为在第3行中,对于StimuliA,我们有一个新的1。
In the sample, line 1 notes a stimuli for A and line 2 a 1 for R7. The outcome in the desired result is then a row with a 1 at StimuliA and a 1 at R7. Then it starts again because in the line 3 we have a new 1 for StimuliA.
最后,对于每个刺激,都会汇总以下已发生的响应(R2 -R7)在同一行中。刺激(A或B)的值保持为1。
In the end for every Stimuli there will be a summary of the following occured Responses (R2-R7) in the same row. The value of Stimuli (A or B) stays 1.
问题:我觉得我可以使用dplyr软件包来实现这一点,但是尝试还没有得出很多有用的结果。如何使用dplyr命令来构造语法,还是应该在另一个方向上寻找解决方案?我会更改相同的现有数据框还是创建一个新的数据框?
Question: I feel I can achieve this with the dplyr package, but my previous attempts have not concluded in much useful output. How would I structure the syntax with the dplyr commands or should I search for a solution in another direction? Would i mutate the same existing dataframe or create a new one?
感谢所有的输入和帮助!
Thanks for all the inputs and help!
推荐答案
这是基本R中的两行解决方案。首先,创建一个ID,该ID对于每个用户(新)刺激组合都是唯一的。这是通过粘贴
和 cumsum
完成的。
Here is a two line solution in base R. First, create an ID that is unique to each user-(new)stimulus combination. This is accomplished with paste
and cumsum
.
dat$stims <- with(dat, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))
然后使用汇总
计算每个新ID的响应
Then use aggregate
to calculate the responses for each of the new IDs
aggregate(. ~ User + stims, data=dat, sum)
User stims StimuliA StimuliB R2 R3 R4 R5 R6 R7
1 1 1_0 1 0 0 0 0 0 0 1
2 1 2_0 1 0 1 1 0 0 1 0
3 1 2_1 0 1 1 2 0 0 1 0
4 1 2_2 0 1 0 0 0 0 0 0
5 2 3_2 1 0 3 0 0 0 0 0
6 2 3_3 0 1 1 0 0 0 2 0
这篇关于用dplyr汇总和计数R中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!