根据dplyr中的列值获取几何平均值/标准差 [英] Getting a Geometric Mean/SD based on column value in dplyr

查看:148
本文介绍了根据dplyr中的列值获取几何平均值/标准差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否可以使用 dplyr 根据另一列的值获取一组值的几何平均值,或者是否有更好的方法。

I am wondering if it is possible to get the geometric mean of a set of values based upon the value of another column using dplyr, or if there is a better way.

我有这样的东西作为 data.frame

Days.Stay | Svc
5         | Med
6         | Surg
...       | ...

我想获取一列并将其命名为 Geo .Mean.Days.Stay 或类似的东西,其中值是 Days.Stay 的几何平均值,并由<$ c $分组c> Svc ,因此每个 Svc 将具有自己唯一的几何平均值-我想将其扩展到几何标准差。因此, data.frame 结果如下:

I'd like to get a column and call it Geo.Mean.Days.Stay or something like that, where the value is derived as the geometric mean of Days.Stay grouped by Svc, so each Svc will have its own unique geometric mean - and I would like to extend this to the geometric standard deviation. So a data.frame result like so:

Days.Stay | Svc | Geo.Mean.Days.Stay | Geo.SD.Days.Stay
5         | Med | 6.78               | 2.7
6         | Surg| 5.4                | 2.1

dplyr 是一个不错的选择还是应该使用其他方法?

Is dplyr a good package for this or should I use an alternate method?

推荐答案

这应该有效:

library("dplyr")
dd %>% group_by(svc) %>%
    summarise(Geo.Mean.Days.Stay=exp(mean(log(Days.Stay))),
              Geo.SD.Days.Stay=exp(sd(log(Days.Stay))))

如果要定期使用几何均值和SD,最好定义一些辅助函数( gmean< ;-function(x)exp(mean(log(x())))以提高可读性...

If you were going to use the geometric mean and SD on a regular basis it would be a good idea to define some helper functions (gmean <- function(x) exp(mean(log(x)))) to improve readability ...

这篇关于根据dplyr中的列值获取几何平均值/标准差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆