ddply +汇总,可在大量列中重复相同的统计函数 [英] ddply + summarize for repeating same statistical function across large number of columns

查看:162
本文介绍了ddply +汇总,可在大量列中重复相同的统计函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,第二个R问题快速接success而来.

Ok, second R question in quick succession.

我的数据:

           Timestamp    St_01  St_02 ...
1 2008-02-08 00:00:00  26.020 25.840 ...
2 2008-02-08 00:10:00  25.985 25.790 ...
3 2008-02-08 00:20:00  25.930 25.765 ...
4 2008-02-08 00:30:00  25.925 25.730 ...
5 2008-02-08 00:40:00  25.975 25.695 ...
...

基本上,通常我会使用ddplysummarize的组合来计算合奏(例如,全年中每小时的平均值).

Basically normally I would use a combination of ddply and summarize to calculate ensembles (e.g. mean for every hour across the whole year).

在上述情况下,我将创建一个类别,例如小时(例如strptime(data$Timestamp,"%H") -> data$hour,然后在ddply中使用该类别,例如ddply(data,"hour", summarize, St_01=mean(St_01), St_02=mean(St_02)...)即可按类别对各列进行平均.

In the case above, I would create a category, e.g. hour (e.g. strptime(data$Timestamp,"%H") -> data$hour and then use that category in ddply, like ddply(data,"hour", summarize, St_01=mean(St_01), St_02=mean(St_02)...) to average by category across each of the columns.

但是这里有粘性.我要处理40多个列,而且我不准备一一一一地输入它们作为summarize函数的参数.我曾经在shell中编写一个循环来生成此代码,但这不是程序员解决问题的方法吗?

but here is where it gets sticky. I have more than 40 columns to deal with and I'm not prepared to type them all one by one as parameters to the summarize function. I used to write a loop in shell to generate this code but that's not how programmers solve problems is it?

所以请告诉我,有没有人有更好的方法来获得相同的结果,而击键次数却更少?

So pray tell, does anyone have a better way of achieving the same result but with less keystrokes?

推荐答案

您可以使用numcolwise()对所有数字列运行摘要.

You can use numcolwise() to run a summary over all numeric columns.

以下是使用iris的示例:

ddply(iris, .(Species), numcolwise(mean))
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026


类似地,在所有分类列中都有catcolwise()可以汇总.


Similarly, there is catcolwise() to summarise over all categorical columns.

有关更多帮助和示例,请参见?numcolwise.

See ?numcolwise for more help and examples.

编辑

另一种方法是使用reshape2(由@ gsk3提出).在此示例中,它具有更多的击键功能,但具有极大的灵活性:

An alternative approach is to use reshape2 (proposed by @gsk3). This has more keystrokes in this example, but gives you enormous flexibility:

库(reshape2)

library(reshape2)

miris <- melt(iris, id.vars="Species")
x <- ddply(miris, .(Species, variable), summarize, mean=mean(value))

dcast(x, Species~variable, value.var="mean")
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

这篇关于ddply +汇总,可在大量列中重复相同的统计函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆