用dplyr汇总和计数R中的数据 [英] Summarize and count data in R with dplyr

查看:171
本文介绍了用dplyr汇总和计数R中的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:用dplyr汇总/计数已发生刺激的同一行中的响应。

Goal: Summarize/count responses in the same row of an occured stimuli with dplyr.

背景:在另一个主题上,我获得了一些出色的帮助:遍历R中的数据框并测量两个值之间的时间差

Background: I got some excellent help in another topic: Loop through dataframe in R and measure time difference between two values

现在,我正在使用相同/相似的数据集,我的目标是计算与刺激发生在同一行的用户对感知刺激的响应。数据集如下所示:

Now, I am working with the same/ similar dataset and my goal is to count the responses on perceived stimuli of users in the same row as where the stimuli occured. The dataset looks like this:

structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), StimuliA = c(1L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), StimuliB = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), R2 = c(0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L
), R3 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R5 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), R6 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), R7 = c(0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("User", 
"StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", "R7"), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
    cols = structure(list(User = structure(list(), class = c("collector_integer", 
    "collector")), StimuliA = structure(list(), class = c("collector_integer", 
    "collector")), StimuliB = structure(list(), class = c("collector_integer", 
    "collector")), R2 = structure(list(), class = c("collector_integer", 
    "collector")), R3 = structure(list(), class = c("collector_integer", 
    "collector")), R4 = structure(list(), class = c("collector_integer", 
    "collector")), R5 = structure(list(), class = c("collector_integer", 
    "collector")), R6 = structure(list(), class = c("collector_integer", 
    "collector")), R7 = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("User", "StimuliA", "StimuliB", 
    "R2", "R3", "R4", "R5", "R6", "R7")), default = structure(list(), 
class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"))

所需的输出:所需的输出将被汇总,所有响应汇总在出现的刺激的同一行:

Desired output: The desired output would be summarized list with all responses aggregate in the same row of the occured stimuli:

U   StimuliA    StimuliB    R2  R3  R4  R5  R6  R7
1      1            0       0   0   0   0   0   1
1      1            0       1   1   0   0   1   0
1      0            1       1   2   0   0   1   0
1      0            1       0   0   0   0   0   0
2      1            0       3   0   0   0   0   0
2      0            1       1   0   0   0   2   0

在示例中,第1行记录A的刺激,而第2行1 1记录R7的刺激。然后,期望结果的结果是一行,在StimuliA处为1,在R7处为1。然后它再次开始,因为在第3行中,对于StimuliA,我们有一个新的1。

In the sample, line 1 notes a stimuli for A and line 2 a 1 for R7. The outcome in the desired result is then a row with a 1 at StimuliA and a 1 at R7. Then it starts again because in the line 3 we have a new 1 for StimuliA.

最后,对于每个刺激,都会汇总以下已发生的响应(R2 -R7)在同一行中。刺激(A或B)的值保持为1。

In the end for every Stimuli there will be a summary of the following occured Responses (R2-R7) in the same row. The value of Stimuli (A or B) stays 1.

问题:我觉得我可以使用dplyr软件包来实现这一点,但是尝试还没有得出很多有用的结果。如何使用dplyr命令来构造语法,还是应该在另一个方向上寻找解决方案?我会更改相同的现有数据框还是创建一个新的数据框?

Question: I feel I can achieve this with the dplyr package, but my previous attempts have not concluded in much useful output. How would I structure the syntax with the dplyr commands or should I search for a solution in another direction? Would i mutate the same existing dataframe or create a new one?

感谢所有的输入和帮助!

Thanks for all the inputs and help!

推荐答案

这是基本R中的两行解决方案。首先,创建一个ID,该ID对于每个用户(新)刺激组合都是唯一的。这是通过粘贴 cumsum 完成的。

Here is a two line solution in base R. First, create an ID that is unique to each user-(new)stimulus combination. This is accomplished with paste and cumsum.

dat$stims <- with(dat, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))

然后使用汇总计算每个新ID的响应

Then use aggregate to calculate the responses for each of the new IDs

aggregate(. ~ User + stims, data=dat, sum)
  User stims StimuliA StimuliB R2 R3 R4 R5 R6 R7
1    1   1_0        1        0  0  0  0  0  0  1
2    1   2_0        1        0  1  1  0  0  1  0
3    1   2_1        0        1  1  2  0  0  1  0
4    1   2_2        0        1  0  0  0  0  0  0
5    2   3_2        1        0  3  0  0  0  0  0
6    2   3_3        0        1  1  0  0  0  2  0

这篇关于用dplyr汇总和计数R中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆