仅选择一定数量的试验后如何获得所有参与者的平均值 [英] How to get mean for all participants after selecting only a certain number of trials

查看:52
本文介绍了仅选择一定数量的试验后如何获得所有参与者的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,每个参与者要进行500次试验,我想从中进行各种采样(即,我想从每个参与者中采样相同数量的试验),然后计算每个参与者的平均值.而不是这样做,它是针对每个数字"(例如,每个数字)分别为每个参与者创建一个具有一个均值的文件.如果具有125个试验的参与者1的均值是426,这将是整个文件,则具有150个试验的参与者1的另一个文件具有单个值,这将对所有参与者发生.我的目标是要为所有参与者分配125美元的单一文件,然后再为150个目标分配另一个文件,以此类推.

I have a dataset of 500 trials per participant that I want to sample from in various quantities (i.e. I want to sample the same number of trials from each participant) and then compute the mean for each participant. Instead of doing so, it is creating a file with a one mean for each participant separately for each "num", e.g. if the mean for participant 1 with 125 trials is 426 that will be the whole file, then another file for participant 1 with 150 trials with a single value, and that is what happens for all participants. I was aiming for a single file for 125 with the means for all participants, then another file with the means for 150, etc.

num <- c(125,150,175,200,225,250,275,300,325,350,375,400)

Subset2 <- list()


for (x in 1:12){
  for (j in num){
   Subset2[[x]] <- improb2 %>% group_by(Participant) %>% sample_n(j) %>% summarise(mean = mean(RT))
  
  
}}

以下是可重现的示例:

RT <- sample(200:600, 10000, replace=T)
df <- data.frame(Participant= letters[1:20]) 
df <- as.data.frame(df[rep(seq_len(nrow(df)), each = 500),])

improb2 <- cbind(RT, df)
improb2 <- improb2 %>% rename(Participant = `df[rep(seq_len(nrow(df)), each = 500), ]`)

subset2中所需的数据帧之一如下所示:

One of the desired dataframes in subset2 would be something like:

Subset2[[1]]

Participant  mean
   <chr>       <dbl>
 1 P001         475.
 2 P002         403.
 3 P003         481.
 4 P004         393.
 5 P005         376.
 6 P006         402.
 7 P007         497.
 8 P008         372.
 9 P010         341.

推荐答案

此答案使用 tidyverse 并输出列表对象 data ,其中名称为样本大小.要访问每个样本数量摘要,您必须使用反引号 data $`125` . data $`125` 是一个小对象.我在输出中发表了评论,您可以根据需要将其更改为 data.frame 对象.

This answer uses tidyverse and outputs a list object data where the names are the sample sizes. To access each sample size summary you have to use backticks data$`125` . data$`125` is a tibble object. I made a comment in the output where you can change it to a data.frame object if you need.

library(tidyverse)

num <- c(125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400)

# create function to sample data by certain size and summarize by mean
get_mean <- function(x, n) { 
  dplyr::group_by(x, Participant) %>% # group by participant
    dplyr::sample_n(n) %>% # randomly sample observations
    dplyr::summarize(mean = mean(RT), # get mean of RT
                     n = n(), # get sample size
                     .groups = "keep") %>% 
    dplyr::ungroup()
# add a pipe to as.data.frame if you don't want a tibble object
}

# create a list object where the names are the sample sizes
data <- lapply(setNames(num, num), function(sample_size) {get_mean(df, n = sample_size)})

head(data$`125`)

 Participant  mean     n
  <chr>       <dbl> <int>
1 V1           20.2   125
2 V10          19.9   125
3 V11          19.8   125
4 V12          20.2   125
5 V2           20.5   125
6 V3           20.0   125


数据

我不确定100%确定您的数据集是什么样,但是我相信它看起来像这样:

I wasn't 100% sure what your dataset looked like, but I believe it looks something like this:

# create fake data for 45 participants with 500 obs per participant
df <- replicate(45, rnorm(500, 20, 4)) %>%
  as.data.frame.matrix() %>% 
  tidyr::pivot_longer(everything(), 
                      names_to = "Participant", # id column
                      values_to = "RT") %>% # value column
  dplyr::arrange(Participant)


head(df) # Participant repeated 500 times, with 500 values in RT
 Participant    RT
  <chr>       <dbl>
1 V1           24.7
2 V1           15.2
3 V1           21.1
4 V1           21.6
5 V1           20.3
6 V1           25.6

如果这是一个类似的结构(长有重复的参与者ID和一列 RT 值),则上述方法应该可以工作.

If this is a similar structure (long with repeated participant IDs and a single column RT of values) then the above should work.

这篇关于仅选择一定数量的试验后如何获得所有参与者的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆