如何使用dplyr在一次分析中分析按分组和未分组的数据集 [英] How to analyse a data set both grouped by and ungrouped in one analysis using dplyr

查看：52 发布时间：2021/5/2 20:45:19 r group-by dplyr

本文介绍了如何使用dplyr在一次分析中分析按分组和未分组的数据集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的第一个stackoverflow问题.

This is my first stackoverflow question.

我正在尝试使用dplyr处理并输出数据集中按分类变量(inj_length_cat3)分组的数据摘要.实际上，我是使用mutate()动态生成此变量(从inj_length来的).我还想输出相同的数据汇总，而不分组.我弄清楚该怎么做的唯一方法是对分析进行两次，一次分析，一次分析，然后一次不分组，然后合并输出.gh.

I'm trying to use dplyr to process and output a summary of data grouped by a categorical variable (inj_length_cat3) in my dataset. Actually, I generate this variable (from inj_length) on the fly using mutate(). I also want to output the same summary of the data without grouping. The only way I figured out how to do that is to do the analysis twice over, once with, once without grouping, and then combine the outputs. Ugh.

我敢肯定，有比这更优雅的解决方案了，这让我很烦.我想知道是否有人能够提供帮助.

I'm sure there is a more elegant solution than this and it bugs me. I wonder if anyone would be able to help.

谢谢！

library(dplyr)
df<-data.frame(year=sample(c(2005,2006),20,replace=T),inj_length=sample(1:10,20,replace=T),hiv_status=sample(0:1,20,replace=T))

tmp <- df  %>% 
  mutate(inj_length_cat3 = cut(inj_length, breaks=c(0,3,100), labels = c('<3 years','>3 years')))%>%
  group_by(year,inj_length_cat3)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  ) %>% 
  filter(inj_length_cat3%in%c('<3 years','>3 years'))

tmp_all <- df  %>% 
  group_by(year)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  )

tmp_all$inj_length_cat3=as.factor('All')
tmp<-merge(tmp_all,tmp,all=T)

推荐答案

我不确定您认为这是否更优雅，但是如果您首先创建一个包含所有数据两次的数据框，则可以得到一个可行的解决方案:这样您就可以获取子组，并一次获取总体摘要:

I'm not sure you consider this more elegant, but you can get a solution to work if you first create a dataframe that has all your data twice: once so that you can get the subgroups and once to get the overall summary:

df1 <- rbind(df,df)
df1$inj_length_cat3 <- cut(df$inj_length, breaks=c(0,3,100,Inf),
                           labels = c('<3 years','>3 years','All'))
df1$inj_length_cat3[-(1:nrow(df))] <- "All"

现在，您只需要在没有 mutate()的情况下运行第一个分析:

Now you just need to run your first analysis without mutate():

tmp <- df1  %>% 
  group_by(year,inj_length_cat3)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  ) %>% 
  filter(inj_length_cat3%in%c('<3 years','>3 years','All'))

这篇关于如何使用dplyr在一次分析中分析按分组和未分组的数据集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用dplyr在一次分析中分析按分组和未分组的数据集 [英] How to analyse a data set both grouped by and ungrouped in one analysis using dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用dplyr在一次分析中分析按分组和未分组的数据集 [英] How to analyse a data set both grouped by and ungrouped in one analysis using dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭