汇总相同因素的计数和条件聚合函数 [英] Summarizing count and conditional aggregate functions on the same factor

查看:82
本文介绍了汇总相同因素的计数和条件聚合函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

快速而简短的是,我在汇总具有相同因素的条件下的计数和聚合函数时遇到问题。

Quick and short of it is I'm having problems summarizing count and aggregate functions with conditions on the same factor.

假设我有这个数据框:

library(dplyr)

df = tbl_df(data.frame(
    company=c("Acme", "Meca", "Emca", "Acme", "Meca", "Emca"), 
    year=c("2011", "2010", "2009", "2011", "2010", "2013"), 
    product=c("Wrench", "Hammer", "Sonic Screwdriver", "Fairy Dust", 
              "Kindness", "Helping Hand"), 
    price=c("5.67", "7.12", "12.99", "10.99", NA, FALSE)))

(本质上)创建此数据帧:

which creates this dataframe (in essence):

   company year  product             price
1    Acme  2011  Wrench              5.67
2    Meca  2010  Hammer              7.12
3    Emca  2009  Sonic Screwdriver   12.99
4    Acme  2011  Fairy Dust          10.99
5    Meca  2010  Kindness            NA
...  ...   ...   ...                 ...
n    Emca  2013  Helping Hand        FALSE

假设我要 df<-group_by(df,公司,年份,产品),然后将以下信息全部收集到一个集合中(即数据框):

Let's say I want to df <- group_by(df, company, year, product) and then get the following info all in one collection (i.e. dataframe):


  1. 每个价格清单的计数(包括NA,False)

  2. 每个价格清单条件为不适用

  3. 平均价格,不包括不适用和不正确

  4. 最高价格

  1. Count of each price listing (including NA, False)
  2. Count of each with 'NA' condition
  3. Average price excluding NA and False
  4. Max price

summarize(df, count = n()) #satisfies first item obviously


我遇到了其他问题。我想我需要使用管道运算符吗?如果是这样,有人可以提供一些指导吗?

I'm having issues trying to get the others. I think I need to use pipe operators? If so, can anyone provide some guidance?

这是我尝试过的方法,这显然是错误的,但是我不确定下一步该怎么做:

This is what I've tried and it is blatantly wrong, but I'm not sure where to go next:

 summarize(df,
           total.count = n(),
           count = filter(df, is.na(price)),
           avg.price = filter(df, !is.na(price), price != FALSE),
           max.price = max(filter(df, !is.na(price), price != FALSE))

是的,我已经查看了文档,

And yes, I have reviewed documentation and I'm sure the answers are there, but they might be too advanced for my understanding. Thanks in advance!

推荐答案

假设您的原始数据集是肯定的,但是对于我的理解来说可能太高了。与您创建的类似(即使用 NA 作为字符。您可以指定 na .strings ,同时使用 read.table 读取数据。但是,我猜NA会被自动检测到。

Assuming that your original dataset is similar to the one you created (i.e. with NA as character. You could specify na.strings while reading the data using read.table. But, I guess NAs would be detected automatically.

价格列是因子,需要转换为数字类。当您使用 as.numeric 时,所有非数字元素(即<$​​ c $ c> NA ,即FALSE)都被强制转换为 NA )并带有警告。

The price column is factor which needs to be converted to numeric class. When you use as.numeric, all the non-numeric elements (i.e. "NA", FALSE) gets coerced to NA) with a warning.

library(dplyr)
df %>%
     mutate(price=as.numeric(as.character(price))) %>%  
     group_by(company, year, product) %>%
     summarise(total.count=n(), 
               count=sum(is.na(price)), 
               avg.price=mean(price,na.rm=TRUE),
               max.price=max(price, na.rm=TRUE))



数据



我使用的是相同的数据集(除了 ... 行之外)被显示。

data

I am using the same dataset (except the ... row) that was showed.

df = tbl_df(data.frame(company=c("Acme", "Meca", "Emca", "Acme", "Meca","Emca"),
 year=c("2011", "2010", "2009", "2011", "2010", "2013"), product=c("Wrench", "Hammer",
 "Sonic Screwdriver", "Fairy Dust", "Kindness", "Helping Hand"), price=c("5.67",
 "7.12", "12.99", "10.99", "NA",FALSE)))

这篇关于汇总相同因素的计数和条件聚合函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆