Aggregate()将多个输出列放在矩阵中 [英] aggregate() puts multiple output columns in a matrix instead
问题描述
我要为某个变量计算多个分位数:
I am to compute multiple quantiles for a certain variable:
> res1 <- aggregate(airquality$Wind, list(airquality$Month), function (x) quantile(x, c(0.9, 0.95, 0.975)))
> head(res1)
Group.1 x.90% x.95% x.97.5%
1 5 16.6000 17.5000 18.8250
2 6 14.9000 15.5600 17.3650
3 7 14.3000 14.6000 14.9000
4 8 12.6000 14.0500 14.6000
5 9 14.9600 15.5000 15.8025
结果看起来起初很不错,但是聚合实际上以非常奇怪的形式返回它,后三列不是data.frame的列,而是单个矩阵!
The result looks good at first, but aggregate actually returns it in a very strange form, where the last 3 columns are not columns of a data.frame, but a single matrix!
> names(res1)
[1] "Group.1" "x"
> dim(res1)
[1] 5 2
> class(res1[,2])
[1] "matrix"
进一步处理中存在很多问题。
This causes a lot of problems in further processing.
几个问题:
- 为什么汇总()表现得如此奇怪?
- 有什么方法可以说服
达到我期望的结果? - 或者我是否为此目的使用了错误的
函数?还有其他让
获得所需结果的首选方法吗?
当然我可以对输出进行一些转换
Of course I could do some transformation of the output of aggregate(), but I look for some more simple and straightforward solution.
推荐答案
Q1:为什么行为如此奇怪? / h3>
这实际上是记录在?总计
的行为(尽管可能仍然是意外的)。要查看的相关参数将是 simplify
。
Q1: Why is the behavior so strange?
This is actually a documented behavior at ?aggregate
(though it may still be unexpected). The relevant argument to look at would be simplify
.
如果 simplify
设置为 FALSE
,汇总
会生成列表
而是在这种情况下。
If simplify
is set to FALSE
, aggregate
would produce a list
instead in a case like this.
res2 <- aggregate(airquality$Wind, list(airquality$Month), function (x)
quantile(x, c(0.9, 0.95, 0.975)), simplify = FALSE)
str(res2)
# 'data.frame': 5 obs. of 2 variables:
# $ Group.1: int 5 6 7 8 9
# $ x :List of 5
# ..$ 1 : Named num 16.6 17.5 18.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 32 : Named num 14.9 15.6 17.4
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 62 : Named num 14.3 14.6 14.9
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 93 : Named num 12.6 14.1 14.6
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
# ..$ 124: Named num 15 15.5 15.8
# .. ..- attr(*, "names")= chr "90%" "95%" "97.5%"
现在,矩阵
和列表
作为列似乎都是奇怪的行为,但是我认为它更多的是设计状态而不是错误或缺陷。
Now, both a matrix
and a list
as columns may seem to be strange behavior, but I presume it's more of a case of "status by design" rather than a "bug" or a "flaw".
例如,请考虑以下内容:我们希望汇总 airquality数据集中的 Wind和 Temp列,并且我们知道每次汇总会导致多个列(如我们期望的分位数
)。
For instance, consider the following: We want to aggregate both the "Wind" and the "Temp" columns from the "airquality" dataset, and we know that each aggregation would result in multiple columns (like we would expect with quantile
).
res3 <- aggregate(cbind(Wind, Temp) ~ Month, airquality,
function (x) quantile(x, c(0.9, 0.95, 0.975)))
res3
# Month Wind.90% Wind.95% Wind.97.5% Temp.90% Temp.95% Temp.97.5%
# 1 5 16.6000 17.5000 18.8250 74.000 77.500 79.500
# 2 6 14.9000 15.5600 17.3650 87.300 91.100 92.275
# 3 7 14.3000 14.6000 14.9000 89.000 91.500 92.000
# 4 8 12.6000 14.0500 14.6000 94.000 95.000 96.250
# 5 9 14.9600 15.5000 15.8025 91.100 92.550 93.000
在某些方面,将这些值保留为 matrix
列可能有意义-数据一个可以通过其原始列名称轻松访问汇总数据:
In some ways, keeping these values as matrix
-columns might make sense--the data aggregated data are easily accessible by their original column names:
res3$Temp
# 90% 95% 97.5%
# [1,] 74.0 77.50 79.500
# [2,] 87.3 91.10 92.275
# [3,] 89.0 91.50 92.000
# [4,] 94.0 95.00 96.250
# [5,] 91.1 92.55 93.000
Q2:如何获得结果作为单独的列 data.frame
?
但是列表
为在许多情况下,与矩阵
一样,一列也很难处理。如果要将 矩阵
展平为列,请使用 do.call(data.frame,...)
:
Q2: How do you get the results as separate columns in a data.frame
?
But a list
as a column is just as awkward to deal with as a matrix
as a column in many cases. If you want to "flatten" your matrix
into columns, use do.call(data.frame, ...)
:
do.call(data.frame, res1)
# Group.1 x.90. x.95. x.97.5.
# 1 5 16.60 17.50 18.8250
# 2 6 14.90 15.56 17.3650
# 3 7 14.30 14.60 14.9000
# 4 8 12.60 14.05 14.6000
# 5 9 14.96 15.50 15.8025
str(.Last.value)
# 'data.frame': 5 obs. of 4 variables:
# $ Group.1: int 5 6 7 8 9
# $ x.90. : num 16.6 14.9 14.3 12.6 15
# $ x.95. : num 17.5 15.6 14.6 14.1 15.5
# $ x.97.5.: num 18.8 17.4 14.9 14.6 15.8a
Q3:还有其他选择吗?
与大多数事情R一样,当然可以。我的首选替代方法是使用 data.table包,您可以使用该包:
Q3: Are there other alternatives?
As with most things R, yes of course. My preferred alternative would be to use the "data.table" package, with which you can do:
library(data.table)
as.data.table(airquality)[, as.list(quantile(Wind, c(.9, .95, .975))),
by = Month]
# Month 90% 95% 97.5%
# 1: 5 16.60 17.50 18.8250
# 2: 6 14.90 15.56 17.3650
# 3: 7 14.30 14.60 14.9000
# 4: 8 12.60 14.05 14.6000
# 5: 9 14.96 15.50 15.8025
str(.Last.value)
# Classes ‘data.table’ and 'data.frame': 5 obs. of 4 variables:
# $ Month: int 5 6 7 8 9
# $ 90% : num 16.6 14.9 14.3 12.6 15
# $ 95% : num 17.5 15.6 14.6 14.1 15.5
# $ 97.5%: num 18.8 17.4 14.9 14.6 15.8
# - attr(*, ".internal.selfref")=<externalptr>
这篇关于Aggregate()将多个输出列放在矩阵中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!