如何使用dplyr在一次分析中分析按分组和未分组的数据集 [英] How to analyse a data set both grouped by and ungrouped in one analysis using dplyr

查看:52
本文介绍了如何使用dplyr在一次分析中分析按分组和未分组的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的第一个stackoverflow问题.

This is my first stackoverflow question.

我正在尝试使用dplyr处理并输出数据集中按分类变量(inj_length_cat3)分组的数据摘要.实际上,我是使用mutate()动态生成此变量(从inj_length来的).我还想输出相同的数据汇总,而不分组.我弄清楚该怎么做的唯一方法是对分析进行两次,一次分析,一次分析,然后一次不分组,然后合并输出.gh.

I'm trying to use dplyr to process and output a summary of data grouped by a categorical variable (inj_length_cat3) in my dataset. Actually, I generate this variable (from inj_length) on the fly using mutate(). I also want to output the same summary of the data without grouping. The only way I figured out how to do that is to do the analysis twice over, once with, once without grouping, and then combine the outputs. Ugh.

我敢肯定,有比这更优雅的解决方案了,这让我很烦.我想知道是否有人能够提供帮助.

I'm sure there is a more elegant solution than this and it bugs me. I wonder if anyone would be able to help.

谢谢!

library(dplyr)
df<-data.frame(year=sample(c(2005,2006),20,replace=T),inj_length=sample(1:10,20,replace=T),hiv_status=sample(0:1,20,replace=T))

tmp <- df  %>% 
  mutate(inj_length_cat3 = cut(inj_length, breaks=c(0,3,100), labels = c('<3 years','>3 years')))%>%
  group_by(year,inj_length_cat3)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  ) %>% 
  filter(inj_length_cat3%in%c('<3 years','>3 years'))

tmp_all <- df  %>% 
  group_by(year)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  )

tmp_all$inj_length_cat3=as.factor('All')
tmp<-merge(tmp_all,tmp,all=T)

推荐答案

我不确定您认为这是否更优雅,但是如果您首先创建一个包含所有数据两次的数据框,则可以得到一个可行的解决方案:这样您就可以获取子组,并一次获取总体摘要:

I'm not sure you consider this more elegant, but you can get a solution to work if you first create a dataframe that has all your data twice: once so that you can get the subgroups and once to get the overall summary:

df1 <- rbind(df,df)
df1$inj_length_cat3 <- cut(df$inj_length, breaks=c(0,3,100,Inf),
                           labels = c('<3 years','>3 years','All'))
df1$inj_length_cat3[-(1:nrow(df))] <- "All"

现在,您只需要在没有 mutate()的情况下运行第一个分析:

Now you just need to run your first analysis without mutate():

tmp <- df1  %>% 
  group_by(year,inj_length_cat3)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  ) %>% 
  filter(inj_length_cat3%in%c('<3 years','>3 years','All'))

这篇关于如何使用dplyr在一次分析中分析按分组和未分组的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆