根据另一列的聚合将未聚合的列添加到聚合数据集 [英] Adding a non-aggregated column to an aggregated data set based on the aggregation of another column

查看:68
本文介绍了根据另一列的聚合将未聚合的列添加到聚合数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用聚合功能从原始数据框中添加另一列,而无需实际使用该列来聚合数据?

Is it possible to use the aggregate function to add another column from the original data frame, without actually using that column to aggregate the data?

简化版本的数据,将有助于说明我的问题(称为数据)

This is a very simplied version of data that will help illustrate my question (let's call it data)

name      result.1    result.2    replicate    day     data.for.mean
"obj.1"   1           "good"      1            1        5
"obj.1"   1           "good"      2            1        7
"obj.1"   1           "great"     1            2        6
"obj.1"   1           "good"      2            2        9
"obj.1"   2           "bad"       1            1        10
"obj.1"   2           "not good"  2            1        6
"obj.1"   2           "bad"       1            2        5
"obj.1"   2           "not good"  2            2        3

"obj.2"   1           "excellent" 1            1        14
"obj.2"   1           "good"      2            1        10
"obj.2"   1           "good"      1            2        11
"obj.2"   1           "not bad"   2            2        7
"obj.2"   2           "bad"       1            1        4
"obj.2"   2           "bad"       2            1        3
"obj.2"   2           "horrible"  1            2        2
"obj.2"   2           "dismal"    2            2        1

您会注意到result.1和result.2是绑定的,因此,如果result.1 == 1,result.2是好/好,如果result.1 == 2,则result.2 ==好/不好。我需要聚合数据集中的这两个列,并且与从result.2中选择哪个值无关紧要,因为在聚合数据时,我只需要信息来标识result.1列的1值是好是坏,类似的结果2。因此它可能具有与所有result.1的所有值相对应的所有令人沮丧值。

You'll notice that result.1 and result.2 are tied, such that if result.1 == 1, result.2 is good/great, and if result.1 == 2, then result.2 == bad/not good. I need both of these columns in the aggregated data set and it doesn't matter which value from result.2 is picked when the data is aggregated, I just need the information to identify whether result.1 column's 1 value is good/bad and simiarly for result.2. So it could have all values of "dismal" corresponding with all of result.1's values of 2.

问题在于,由于result.2使用不同的名称来标识好/不好,我不能将其用作汇总依据。

The problem is that, since result.2 uses different names to identify good/bad, I cannot use it to as a column to aggregate by.

当前我的汇总函数如下所示:

Currently my aggregate function looks like this...

aggregated.data <- aggregate(data[c("data.for.mean")], 
            by=data[c("name", "result.1", "day") ],
            FUN= mean }
        );

这将提供一行这样的输出...

which would giving one line of output such as this...

name     result.1    day    data.for.mean
"obj.1"  1           1      6

(obj.1的所有重复项,结果为1 == 1,在day1上已被平均。它们的值分别为5和7并且是我的模拟数据集中的前两行。)

(All of the replicates for obj.1, with a result.1 == 1, on day1 have been averged. They had a value of 5 and 7 and were the first two rows in my mock data set.)

我想要生成一行这样的输出

What I would like would produce a line of output such as this

name     result.1    result.2    day    data.for.mean
"obj.1"  1           "good"      1      6

同样,对于与对应的所有值,好可以替换为大,不错,优秀 result.1的值为'1'。

Again, "good" could be replaced with "great", "not bad", "excellent", for all values which correspond to result.1's value of '1'.

从result.2捕获信息并将其添加到聚集数据的最佳方法是什么(聚集的输出函数)?

What would be the best method of capturing information from result.2 and adding it to aggregated.data (the output of the aggregate function)?

谢谢。

推荐答案

基本的解决方案,它使用 merge ,然后使用另一个 aggregate

Here's a solution in base, which uses merge followed by another aggregate:

agg.2 <- merge(aggregated.data, data[,names(data) != 'data.for.mean'])
aggregate(result.2 ~ name+result.1+day+data.for.mean, data=agg.2, FUN=sample, size=1)
##    name result.1 day data.for.mean  result.2
## 1 obj.2        2   2           1.5    dismal
## 2 obj.2        2   1           3.5       bad
## 3 obj.1        2   2           4.0       bad
## 4 obj.1        1   1           6.0      good
## 5 obj.1        1   2           7.5     great
## 6 obj.1        2   1           8.0  not good
## 7 obj.2        1   2           9.0   not bad
## 8 obj.2        1   1          12.0 excellent

这是这样的:

合并将添加 result.2 值,但将创建多个包含多个此类值的行。然后 aggregate 用于选择这些行之一。

The merge adds in the result.2 values, but will create multiple rows where there are multiple such values. Then aggregate is used to select one of these rows.

正如您所说,您不在乎哪个相关的 result.2 标签,我随 sample 随机得到一个标签。

As you say you don't care which of the relevant result.2 labels you get, I'm getting one at random with sample.

要返回第一个 result.2 标签,请使用 head n = 1 代替:

To return the first result.2 label, use head with n=1 instead:

aggregate(result.2 ~ name+result.1+day+data.for.mean, data=agg.2, FUN=head, n=1)

类似地,要获得最后一个这样的标签,请使用 tail n = 1

Similarly, to get the last such label, use tail with n=1.

这篇关于根据另一列的聚合将未聚合的列添加到聚合数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆