在R中将数据从一个数据帧扩展到另一行的多行 [英] Extending data from one data frame to multiple rows in another in R
问题描述
我是R和此列表的新手。我希望接下来的问题不要太基础或不了解。在过去的几个小时中,我一直在检查档案,但无济于事,所以我在这里发布。问题的一部分是,我在引用我需要的功能时并不完全了解要使用的正确术语,这会使搜索变得困难。话虽如此,这是我需要解决的问题:
I am new to R and to this list. I hope that the question that follows is not too basic or uninformed. I have been checking in the archives for the past few hours to no avail, so here I am posting. Part of the issue is that I don't exactly know the proper terminology to use when refering to the functions that I need, which can make searching difficult. That being said, here is what I need to solve:
我有一个数据框,如下所示:
I have a data frame that looks like the following:
Subject Item Region RT
13 102 1 R1 1245
14 102 4 R1 1677
15 102 7 R1 1730
25 103 1 R1 815
26 103 4 R1 828
27 103 7 R1 985
1489 102 1 R2 356
1490 102 4 R2 510
1491 102 7 R2 544
1501 103 1 R2 447
1502 103 4 R2 486
1503 103 7 R2 221
...
每个对象对一个项目的多个区域都有一个RT(反应时间)。每个主题看到多个项目。
Each subject has an RT (reaction time) for multiple regions of one item. And each subject sees multiple items.
我希望计算离群值,然后对其进行归一化(尽管我实际上并不担心该线程中的解决方案)。第一步,我使用了一些简单的函数来计算每个主题的每个区域的平均值和标准差,并使其折叠(例如,(该主题在该区域内所有RT的平均值):
I wish to compute outliers and then normalize them (though I'm not really going to worry about that solution in this thread). As a first step, I used some simple functions to compute the mean and SD for each Region for each subject, collapsing across items (i.e, (average of all the RTs that subject has in that region):
Mean = with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))
SD = with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))
然后我使用cbind并进行了一些重命名以将数据全部收集到一个数据帧中:
I then used cbind and did some renaming to get the data all in one dataframe:
Subject Region Mean SD
1 102 R1 1143.7778 202.25530
2 102 R2 431.8611 125.84393
9 103 R1 923.0833 179.51098
10 103 R2 344.1667 146.51192
...
问题是我现在需要将所有方法与每个主题的正确区域相关联。生成看起来像这样的输出(请注意,所有主题102区域R1的均值均相同和SD,但不同的RT等)。
The issue is that I now need to associate all of the means with the correct regions for each subject. That is, I would like to generate output that looks like this (note that all Subject 102 Region R1s have the same mean and SD, but different RTs etc.):
Subject Item Region RT Mean SD
13 102 1 R1 1245 1143.7778 202.25530
14 102 4 R1 1677 1143.7778 202.25530
15 102 7 R1 1730 1143.7778 202.25530
25 103 1 R1 815 923.0833 179.51098
26 103 4 R1 828 923.0833 179.51098
27 103 7 R1 985 923.0833 179.51098
1489 102 1 R2 356 431.8611 125.84393
1490 102 4 R2 510 431.8611 125.84393
1491 102 7 R2 544 431.8611 125.84393
1501 103 1 R2 447 344.1667 146.51192
1502 103 4 R2 486 344.1667 146.51192
1503 103 7 R2 221 344.1667 146.51192
似乎merge和cbind不会完成将一个值扩展和匹配到另一个值的工作。也许我需要使用melt或使用某个键的某些函数?
It seems that merge and cbind are not going to do the job of extending and matching one value to another. Perhaps I need to make use of melt or some function that uses a key?
我希望有人可以将我指向相关函数,以便我继续阅读。我可以自己尝试,也可以只提供一些代码。
I hope that someone can either point me to the relevant function for me to read up on so that I can try this on my own, or just help with some code.
感谢阅读...
推荐答案
您可以使用 plyr
包中的 ddply
函数来完成此任务。使用 ddply
和 ave
函数:
You could accomplish this task using ddply
function from plyr
package. Using ddply
and ave
function:
test <- read.table(text="
Subject Item Region RT
13 102 1 R1 1245
14 102 4 R1 1677
15 102 7 R1 1730
25 103 1 R1 815
26 103 4 R1 828
27 103 7 R1 985
1489 102 1 R2 356
1490 102 4 R2 510
1491 102 7 R2 544
1501 103 1 R2 447
1502 103 4 R2 486
1503 103 7 R2 221", header=T)
library(plyr)
ddply(test, .(Subject, Region), transform, Mean=ave(RT), SD=ave(RT, FUN=sd))
Subject Item Region RT Mean SD
1 102 1 R1 1245 1550.6667 266.03822
2 102 4 R1 1677 1550.6667 266.03822
3 102 7 R1 1730 1550.6667 266.03822
4 102 1 R2 356 470.0000 100.17984
5 102 4 R2 510 470.0000 100.17984
6 102 7 R2 544 470.0000 100.17984
7 103 1 R1 815 876.0000 94.62029
8 103 4 R1 828 876.0000 94.62029
9 103 7 R1 985 876.0000 94.62029
10 103 1 R2 447 384.6667 143.07457
11 103 4 R2 486 384.6667 143.07457
12 103 7 R2 221 384.6667 143.07457
您可以
> with(test, aggregate(RT, by = list(Subject,Region),mean, na.rm=TRUE))
Group.1 Group.2 x
1 102 R1 1550.6667
2 103 R1 876.0000
3 102 R2 470.0000
4 103 R2 384.6667
> with(test, aggregate(RT, by = list(Subject,Region),sd, na.rm=TRUE))
Group.1 Group.2 x
1 102 R1 266.03822
2 103 R1 94.62029
3 102 R2 100.17984
4 103 R2 143.07457
当您可以看到主题
和 Region
汇总的均值和sd都放入了数据中.frame
(测试
)。
As you can see both the mean and the sd aggregated by Subject
and Region
are put into your data.frame
(test
).
编辑
如果要处理不适用
,您可能要使用以下编辑后的代码:
If you want to deal with NA
, you may want to use the following edited code:
ddply(test, .(Subject, Region), transform,
Mean=ave(RT, FUN = function(x) mean(x, na.rm=TRUE)),
SD=ave(RT, FUN=function(x) sd(x, na.rm=TRUE)))
这篇关于在R中将数据从一个数据帧扩展到另一行的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!