在R中将数据从一个数据帧扩展到另一行的多行 [英] Extending data from one data frame to multiple rows in another in R

查看:76
本文介绍了在R中将数据从一个数据帧扩展到另一行的多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R和此列表的新手。我希望接下来的问题不要太基础或不了解。在过去的几个小时中,我一直在检查档案,但无济于事,所以我在这里发布。问题的一部分是,我在引用我需要的功能时并不完全了解要使用的正确术语,这会使搜索变得困难。话虽如此,这是我需要解决的问题:

I am new to R and to this list. I hope that the question that follows is not too basic or uninformed. I have been checking in the archives for the past few hours to no avail, so here I am posting. Part of the issue is that I don't exactly know the proper terminology to use when refering to the functions that I need, which can make searching difficult. That being said, here is what I need to solve:

我有一个数据框,如下所示:

I have a data frame that looks like the following:

   Subject Item Region   RT  
13     102    1  R1 1245  
14     102    4  R1 1677  
15     102    7  R1 1730  
25     103    1  R1  815  
26     103    4  R1  828  
27     103    7  R1  985  
1489     102    1  R2 356  
1490     102    4  R2 510  
1491     102    7  R2 544  
1501     103    1  R2 447  
1502     103    4  R2 486  
1503     103    7  R2 221  
...  

每个对象对一个项目的多个区域都有一个RT(反应时间)。每个主题看到多个项目。

Each subject has an RT (reaction time) for multiple regions of one item. And each subject sees multiple items.

我希望计算离群值,然后对其进行归一化(尽管我实际上并不担心该线程中的解决方案)。第一步,我使用了一些简单的函数来计算每个主题的每个区域的平均值和标准差,并使其折叠(例如,(该主题在该区域内所有RT的平均值):

I wish to compute outliers and then normalize them (though I'm not really going to worry about that solution in this thread). As a first step, I used some simple functions to compute the mean and SD for each Region for each subject, collapsing across items (i.e, (average of all the RTs that subject has in that region):

Mean = with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))  
SD = with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))  

然后我使用cbind并进行了一些重命名以将数据全部收集到一个数据帧中:

I then used cbind and did some renaming to get the data all in one dataframe:

Subject Region      Mean         SD  
1       102  R1 1143.7778  202.25530  
2       102  R2  431.8611  125.84393  
9       103  R1  923.0833  179.51098  
10      103  R2  344.1667  146.51192  
...  

问题是我现在需要将所有方法与每个主题的正确区域相关联。生成看起来像这样的输出(请注意,所有主题102区域R1的均值均相同和SD,但不同的RT等)。

The issue is that I now need to associate all of the means with the correct regions for each subject. That is, I would like to generate output that looks like this (note that all Subject 102 Region R1s have the same mean and SD, but different RTs etc.):

Subject Item Region   RT Mean         SD  
13     102    1  R1 1245 1143.7778  202.25530  
14     102    4  R1 1677 1143.7778  202.25530  
15     102    7  R1 1730 1143.7778  202.25530  
25     103    1  R1  815 923.0833  179.51098  
26     103    4  R1  828 923.0833  179.51098  
27     103    7  R1  985 923.0833  179.51098  
1489     102    1  R2 356 431.8611  125.84393   
1490     102    4  R2 510 431.8611  125.84393  
1491     102    7  R2 544 431.8611  125.84393  
1501     103    1  R2 447 344.1667  146.51192  
1502     103    4  R2 486 344.1667  146.51192  
1503     103    7  R2 221 344.1667  146.51192  

似乎merge和cbind不会完成将一个值扩展和匹配到另一个值的工作。也许我需要使用melt或使用某个键的某些函数?

It seems that merge and cbind are not going to do the job of extending and matching one value to another. Perhaps I need to make use of melt or some function that uses a key?

我希望有人可以将我指向相关函数,以便我继续阅读。我可以自己尝试,也可以只提供一些代码。

I hope that someone can either point me to the relevant function for me to read up on so that I can try this on my own, or just help with some code.

感谢阅读...

推荐答案

您可以使用 plyr 包中的 ddply 函数来完成此任务。使用 ddply ave 函数:

You could accomplish this task using ddply function from plyr package. Using ddply and ave function:

test <- read.table(text="
Subject Item Region   RT  
13     102    1  R1 1245  
14     102    4  R1 1677  
15     102    7  R1 1730  
25     103    1  R1  815  
26     103    4  R1  828  
27     103    7  R1  985  
1489     102    1  R2 356  
1490     102    4  R2 510  
1491     102    7  R2 544  
1501     103    1  R2 447  
1502     103    4  R2 486  
1503     103    7  R2 221", header=T)

library(plyr)
ddply(test, .(Subject, Region), transform, Mean=ave(RT), SD=ave(RT, FUN=sd))
   Subject Item Region   RT      Mean        SD
1      102    1     R1 1245 1550.6667 266.03822
2      102    4     R1 1677 1550.6667 266.03822
3      102    7     R1 1730 1550.6667 266.03822
4      102    1     R2  356  470.0000 100.17984
5      102    4     R2  510  470.0000 100.17984
6      102    7     R2  544  470.0000 100.17984
7      103    1     R1  815  876.0000  94.62029
8      103    4     R1  828  876.0000  94.62029
9      103    7     R1  985  876.0000  94.62029
10     103    1     R2  447  384.6667 143.07457
11     103    4     R2  486  384.6667 143.07457
12     103    7     R2  221  384.6667 143.07457

您可以

> with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))  
  Group.1 Group.2         x
1     102      R1 1550.6667
2     103      R1  876.0000
3     102      R2  470.0000
4     103      R2  384.6667
> with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))
  Group.1 Group.2         x
1     102      R1 266.03822
2     103      R1  94.62029
3     102      R2 100.17984
4     103      R2 143.07457

当您可以看到主题 Region 汇总的均值和sd都放入了数据中.frame 测试)。

As you can see both the mean and the sd aggregated by Subject and Region are put into your data.frame (test).

编辑

如果要处理不适用,您可能要使用以下编辑后的代码:

If you want to deal with NA, you may want to use the following edited code:

ddply(test, .(Subject, Region), transform, 
      Mean=ave(RT, FUN = function(x) mean(x, na.rm=TRUE)),
      SD=ave(RT, FUN=function(x) sd(x, na.rm=TRUE)))

这篇关于在R中将数据从一个数据帧扩展到另一行的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆