如何以正确的格式从 R 的聚合函数中获取 data.frame? [英] How do I get a data.frame from R's aggregate function in the right format?

查看:22
本文介绍了如何以正确的格式从 R 的聚合函数中获取 data.frame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法让 R 的 aggregate() 函数以我想要的格式返回 data.frame.

I'm having trouble getting R's aggregate() function to return a data.frame in the format that I'd like.

基本上我是这样运行聚合的:

Basically I run the aggregation like so:

aggregate(df$res, list(full$depth), summary)

其中 res 列包含 TRUEFALSENA.我想根据 depth 中的组计算 res 的每个值出现的次数,它们是六个数字深度值 0、5、15、30、60 和100. 根据聚合函数的帮助页面,它将 by 值强制转换为因子,所以这应该不是问题(据我所知).

where the res column contains TRUE, FALSE and NA. I want to calculate the number of times each value of res occurs according to the groups in depth, which are six numeric depth values 0, 5, 15, 30, 60 and 100. According to the help page on the aggregate function it coerces the by values to factors, so this oughtn't be a problem (as far as I can tell).

所以我运行聚合函数并将其存储在 data.frame 中.这可以;它运行没有错误.在 R 控制台中显示的摘要如下所示:

So I run the aggregate function and store it in a data.frame. This is fine; it runs without error. The summary displayed in the R console looks like this:

  Group.1  x.Mode x.FALSE x.TRUE x.NA's
1       0 logical       3     83      0
2       5 logical       3     83      0
3      15 logical       8     78      0
4      30 logical       5     79      2
5      60 logical       1     64     21
6     100 logical       1     24     61

同样,这很好,看起来像我想要的.但是包含结果的data.frame实际上只有两列,看起来像这样:

Again, this is fine, and looks like what I want. But the data.frame containing the results actually has only two columns, and looks like this:

    Group.1 x
1   0   logical
2   5   logical
3   15  logical
4   30  logical
5   60  logical
6   100 logical
7       3
8       3
9       8
10      5
11      1
12      1
13      83
14      83
15      78
16      79
17      64
18      24
19      0
20      0
21      0
22      2
23      21
24      61

我从 aggregate() 帮助页面了解到:

I understand from the aggregate() help page that:

如果by有名字,则用非空时间标记列在结果中,未命名的分组变量被命名为 Group.i对于 by[[i]].

If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.i for by[[i]].

这向我表明,如果 by 有名称,那么输出 data.frame 看起来更像是打印到 R 控制台的摘要(即它有 5 列,包括by) 中每个级别的一列计数,而不是它实际保存为的两列版本.问题是帮助页面根本没有解释命名的 by 变量是什么,尤其是当它像我一样从 data.frame 列被强制转换为列表时.

which suggests to me that if the by has names then the output data.frame would look more like the summary of it that gets printed to the R console (i.e. it'd have 5 columns including a column of counts for each level in by) than the two-column version it actually gets saved as. The trouble is that the help page doesn't explain at all what a named by variable is, especially if it's coerced to a list from a data.frame column as in my case.

为了使 aggregate() 产生的 data.frame 具有 by 的每个级别的计数列,我需要做什么不同的事情帮助表明如果我知道我在做什么,它可以吗?

What do I need to do differently in order for the data.frame that results from aggregate() to have a column of counts for each level of by as the help suggests it could if I knew what I was doing?

推荐答案

这是因为在这种情况下 aggregate 的结果相当奇怪,其中最后一列实际上是一个有四列的矩阵,所以结果看起来像一个 5 列的数据框,但它实际上是一个 2 列的数据框,其中第二列是一个 4 宽的矩阵.这是将其转换为普通 data.frame 的解决方法:

This is because the result of aggregate is fairly odd in this case, where the last column is actually a matrix that has four columns, so the result looks like a 5 column data frame, but it's really a 2 column data frame, where the 2nd column is a 4 wide matrix. Here is a workaround to convert it to a normal data.frame:

X <- aggregate(sample(c(T, F, NA), 100, r=T), list(rep(letters[1:4], 25)), summary)
X <- cbind(X[-ncol(X)], X[[ncol(X)]])
str(X)
# 'data.frame':  4 obs. of  5 variables:
# $ Group.1: chr  "a" "b" "c" "d"
# $ Mode   : Factor w/ 1 level "logical": 1 1 1 1
# $ FALSE  : Factor w/ 4 levels "10","4","6","8": 3 2 4 1
# $ TRUE   : Factor w/ 2 levels "15","8": 2 1 2 2
# $ NA's   : Factor w/ 4 levels "11","6","7","9": 1 2 4 3

结果的奇数是 summary 返回一个长度为 4 的向量而不是单个值的函数.

The oddness of the result is a function of summary returning a 4 length vector instead of a single value.

这篇关于如何以正确的格式从 R 的聚合函数中获取 data.frame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆