表示多个组的多个列 [英] Means multiple columns by multiple groups

查看:19
本文介绍了表示多个组的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为包含多个组的数据框的多列找到不包括 NA 的方法

I am trying to find the means, not including NAs, for multiple columns withing a dataframe by multiple groups

airquality <- data.frame(City = c("CityA", "CityA","CityA",
                                  "CityB","CityB","CityB",
                                  "CityC", "CityC"),
                         year = c("1990", "2000", "2010", "1990", 
                                  "2000", "2010", "2000", "2010"),
                         month = c("June", "July", "August",
                                   "June", "July", "August",
                                   "June", "August"),
                         PM10 = c(runif(3), rnorm(5)),
                         PM25 = c(runif(3), rnorm(5)),
                         Ozone = c(runif(3), rnorm(5)),
                         CO2 = c(runif(3), rnorm(5)))
airquality

所以我得到了一个带有数字的名称列表,所以我知道要选择哪些列:

So I get a list of the names with the number so I know which columns to select:

nam<-names(airquality)
namelist <- data.frame(matrix(t(nam)));namelist

我想按城市和年份计算 PM25、臭氧和二氧化碳的平均值.这意味着我需要列 1,2,4,6:7)

I want to calculate the mean by City and Year for PM25, Ozone, and CO2. That means I need columns 1,2,4,6:7)

acast(datadf, year ~ city, mean, na.rm=TRUE)

但这并不是我真正想要的,因为它包含了我不需要的东西的平均值,而且它不是数据帧格式.我可以转换它然后删除,但这似乎是一种非常低效的方法.

But this is not really what I want because it includes the mean of something I do not need and it is not in a data frame format. I could convert it and then drop, but that seems like a very inefficient way to do it.

有更好的方法吗?

推荐答案

我们可以使用 dplyrsummarise_at 来得到相关的 mean按感兴趣的列分组后的列

We can use dplyr with summarise_at to get mean of the concerned columns after grouping by the column of interest

library(dplyr)
airquality %>%
   group_by(City, year) %>% 
   summarise_at(vars("PM25", "Ozone", "CO2"), mean)

或者使用dplyrdevel版本(version - ‘0.8.99.9000’)

Or using the devel version of dplyr (version - ‘0.8.99.9000’)

airquality %>%
     group_by(City, year) %>%
     summarise(across(PM25:CO2, mean))

这篇关于表示多个组的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆