根据dplyr中的列值获取几何平均值/标准差 [英] Getting a Geometric Mean/SD based on column value in dplyr
问题描述
我想知道是否可以使用 dplyr
根据另一列的值获取一组值的几何平均值,或者是否有更好的方法。
I am wondering if it is possible to get the geometric mean of a set of values based upon the value of another column using dplyr
, or if there is a better way.
我有这样的东西作为 data.frame
Days.Stay | Svc
5 | Med
6 | Surg
... | ...
我想获取一列并将其命名为 Geo .Mean.Days.Stay
或类似的东西,其中值是 Days.Stay
的几何平均值,并由<$ c $分组c> Svc ,因此每个 Svc
将具有自己唯一的几何平均值-我想将其扩展到几何标准差。因此, data.frame
结果如下:
I'd like to get a column and call it Geo.Mean.Days.Stay
or something like that, where the value is derived as the geometric mean of Days.Stay
grouped by Svc
, so each Svc
will have its own unique geometric mean - and I would like to extend this to the geometric standard deviation. So a data.frame
result like so:
Days.Stay | Svc | Geo.Mean.Days.Stay | Geo.SD.Days.Stay
5 | Med | 6.78 | 2.7
6 | Surg| 5.4 | 2.1
dplyr
是一个不错的选择还是应该使用其他方法?
Is dplyr
a good package for this or should I use an alternate method?
推荐答案
这应该有效:
library("dplyr")
dd %>% group_by(svc) %>%
summarise(Geo.Mean.Days.Stay=exp(mean(log(Days.Stay))),
Geo.SD.Days.Stay=exp(sd(log(Days.Stay))))
如果要定期使用几何均值和SD,最好定义一些辅助函数( gmean< ;-function(x)exp(mean(log(x()))
)以提高可读性...
If you were going to use the geometric mean and SD on a regular basis it would be a good idea to define some helper functions (gmean <- function(x) exp(mean(log(x)))
) to improve readability ...
这篇关于根据dplyr中的列值获取几何平均值/标准差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!