Aggregate()将多个输出列放在矩阵中 [英] aggregate() puts multiple output columns in a matrix instead

查看:140
本文介绍了Aggregate()将多个输出列放在矩阵中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要为某个变量计算多个分位数:

I am to compute multiple quantiles for a certain variable:

> res1 <- aggregate(airquality$Wind, list(airquality$Month), function (x) quantile(x, c(0.9, 0.95, 0.975)))
> head(res1)
  Group.1   x.90%   x.95% x.97.5%
1       5 16.6000 17.5000 18.8250
2       6 14.9000 15.5600 17.3650
3       7 14.3000 14.6000 14.9000
4       8 12.6000 14.0500 14.6000
5       9 14.9600 15.5000 15.8025

结果看起来起初很不错,但是聚合实际上以非常奇怪的形式返回它,后三列不是data.frame的列,而是单个矩阵!

The result looks good at first, but aggregate actually returns it in a very strange form, where the last 3 columns are not columns of a data.frame, but a single matrix!

> names(res1)
[1] "Group.1" "x"      
> dim(res1)
[1] 5 2
> class(res1[,2])
[1] "matrix"

进一步处理中存在很多问题。

This causes a lot of problems in further processing.

几个问题:


  1. 为什么汇总()表现得如此奇怪?

  2. 有什么方法可以说服
    达到我期望的结果?

  3. 或者我是否为此目的使用了错误的
    函数?还有其他让
    获得所需结果的首选方法吗?

当然我可以对输出进行一些转换

Of course I could do some transformation of the output of aggregate(), but I look for some more simple and straightforward solution.

推荐答案

Q1:为什么行为如此奇怪? / h3>

这实际上是记录在?总计的行为(尽管可能仍然是意外的)。要查看的相关参数将是 simplify

Q1: Why is the behavior so strange?

This is actually a documented behavior at ?aggregate (though it may still be unexpected). The relevant argument to look at would be simplify.

如果 simplify 设置为 FALSE 汇总会生成列表而是在这种情况下。

If simplify is set to FALSE, aggregate would produce a list instead in a case like this.

res2 <- aggregate(airquality$Wind, list(airquality$Month), function (x) 
  quantile(x, c(0.9, 0.95, 0.975)), simplify = FALSE)
str(res2)
# 'data.frame':  5 obs. of  2 variables:
#  $ Group.1: int  5 6 7 8 9
#  $ x      :List of 5
#   ..$ 1  : Named num  16.6 17.5 18.8
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 32 : Named num  14.9 15.6 17.4
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 62 : Named num  14.3 14.6 14.9
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 93 : Named num  12.6 14.1 14.6
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 124: Named num  15 15.5 15.8
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"






现在,矩阵列表作为列似乎都是奇怪的行为,但是我认为它更多的是设计状态而不是错误或缺陷。


Now, both a matrix and a list as columns may seem to be strange behavior, but I presume it's more of a case of "status by design" rather than a "bug" or a "flaw".

例如,请考虑以下内容:我们希望汇总 airquality数据集中的 Wind和 Temp列,并且我们知道每次汇总会导致多个列(如我们期望的分位数)。

For instance, consider the following: We want to aggregate both the "Wind" and the "Temp" columns from the "airquality" dataset, and we know that each aggregation would result in multiple columns (like we would expect with quantile).

res3 <- aggregate(cbind(Wind, Temp) ~ Month, airquality, 
                  function (x) quantile(x, c(0.9, 0.95, 0.975)))
res3
#   Month Wind.90% Wind.95% Wind.97.5% Temp.90% Temp.95% Temp.97.5%
# 1     5  16.6000  17.5000    18.8250   74.000   77.500     79.500
# 2     6  14.9000  15.5600    17.3650   87.300   91.100     92.275
# 3     7  14.3000  14.6000    14.9000   89.000   91.500     92.000
# 4     8  12.6000  14.0500    14.6000   94.000   95.000     96.250
# 5     9  14.9600  15.5000    15.8025   91.100   92.550     93.000

在某些方面,将这些值保留为 matrix 列可能有意义-数据一个可以通过其原始列名称轻松访问汇总数据:

In some ways, keeping these values as matrix-columns might make sense--the data aggregated data are easily accessible by their original column names:

res3$Temp
#       90%   95%  97.5%
# [1,] 74.0 77.50 79.500
# [2,] 87.3 91.10 92.275
# [3,] 89.0 91.50 92.000
# [4,] 94.0 95.00 96.250
# [5,] 91.1 92.55 93.000



Q2:如何获得结果作为单独的列 data.frame



但是列表为在许多情况下,与矩阵一样,一列也很难处理。如果要将 矩阵展平为列,请使用 do.call(data.frame,...)

Q2: How do you get the results as separate columns in a data.frame?

But a list as a column is just as awkward to deal with as a matrix as a column in many cases. If you want to "flatten" your matrix into columns, use do.call(data.frame, ...):

do.call(data.frame, res1)
#   Group.1 x.90. x.95. x.97.5.
# 1       5 16.60 17.50 18.8250
# 2       6 14.90 15.56 17.3650
# 3       7 14.30 14.60 14.9000
# 4       8 12.60 14.05 14.6000
# 5       9 14.96 15.50 15.8025
str(.Last.value)
# 'data.frame':  5 obs. of  4 variables:
#  $ Group.1: int  5 6 7 8 9
#  $ x.90.  : num  16.6 14.9 14.3 12.6 15
#  $ x.95.  : num  17.5 15.6 14.6 14.1 15.5
#  $ x.97.5.: num  18.8 17.4 14.9 14.6 15.8a



Q3:还有其他选择吗?



与大多数事情R一样,当然可以。我的首选替代方法是使用 data.table包,您可以使用该包:

Q3: Are there other alternatives?

As with most things R, yes of course. My preferred alternative would be to use the "data.table" package, with which you can do:

library(data.table)
as.data.table(airquality)[, as.list(quantile(Wind, c(.9, .95, .975))), 
                          by = Month]
#    Month   90%   95%   97.5%
# 1:     5 16.60 17.50 18.8250
# 2:     6 14.90 15.56 17.3650
# 3:     7 14.30 14.60 14.9000
# 4:     8 12.60 14.05 14.6000
# 5:     9 14.96 15.50 15.8025
str(.Last.value)
# Classes ‘data.table’ and 'data.frame':  5 obs. of  4 variables:
#  $ Month: int  5 6 7 8 9
#  $ 90%  : num  16.6 14.9 14.3 12.6 15
#  $ 95%  : num  17.5 15.6 14.6 14.1 15.5
#  $ 97.5%: num  18.8 17.4 14.9 14.6 15.8
#  - attr(*, ".internal.selfref")=<externalptr> 

这篇关于Aggregate()将多个输出列放在矩阵中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆