`group_by`并将分组级别保留为嵌套数据框的名称 [英] `group_by` and keep grouping levels as nested data frame's name

查看:56
本文介绍了`group_by`并将分组级别保留为嵌套数据框的名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与不能在地图内部使用emmeans并受其启发

我正在使用以下代码执行数据分析的几个步骤.我想将分组因子的级别保留为嵌套数据框的名称,并使用这些名称来标识过程中的每个步骤,而不是使用默认枚举[[1]],[[2]],[[3]]等.我不明白我收到的错误.请查看如何修复我的代码.

I am doing several steps of data analysis with the following code. I want to keep my grouping factor's levels as the nested data frames' names and uses those names to identify each of the steps along the way, instead of using the default enumeration [[1]], [[2]], [[3]], etc. I don't understand the error I got. Please see how I can fix my code.

library(dplyr)
library(purrr)
library(emmeans)
data("warpbreaks")
wb_emm <-  warpbreaks %>%
  group_by(tension) %>% 
  setNames(unique(.x$tension)) %>%
  nest() %>%
  mutate(models=map(data,~glm(breaks~wool,data=.x))) %>%
  mutate(jt = map(models, ~emmeans::joint_tests(.x, data = .x$data))) %>%
  mutate(means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data))) %>%
  mutate(p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))))

Error in unique(.x$tension) : object '.x' not found

我最初做了 group_by(tension)%>%setNames(unique(tension)),并在 unique(tension)中出错:未找到对象张力" 我也尝试了 split(.$ tension),但它与 nest()

I originally did group_by(tension) %>% setNames(unique(tension)) and got Error in unique(tension) : object 'tension' not found I also tried split(.$tension) but it is conflicted with nest()

但是张力级别是清晰的.

 unique(warpbreaks$tension)
[1] L M H
Levels: L M H

在没有 setNames(unique(.x $ tension))%>%步骤的情况下,代码运行良好.

The code runs well without the setNames(unique(.x$tension)) %>% step.

wb_emm$p_cont
[[1]]
 contrast estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 A - B        16.3 6.87 Inf      2.87      29.8 2.378   0.0174 

Confidence level used: 0.95 

[[2]]
 contrast estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 A - B       -4.78 4.27 Inf     -13.1      3.59 -1.119  0.2630 

Confidence level used: 0.95 

[[3]]
 contrast estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 A - B        5.78 3.79 Inf     -1.66      13.2 1.523   0.1277 

Confidence level used: 0.95 

谢谢.

更新:从下面Ronak Shah提供的第二个解决方案中,我尝试了 diamonds ,但名称没有变化.该代码以 ungroup()%>% ungroup%>%运行.

Update: from the second solution provided by Ronak Shah below, I tried on diamonds but the names were unchanged. The code runs with either ungroup()%>% or ungroup%>%.

diamonds %>%
  group_by(cut) %>%
  nest() %>% 
  ungroup %>%
  mutate(models=map(data,~glm(price ~ x + y + z + clarity + color,data=.x)),
         jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
         means=map(models,~emmeans::emmeans(.x,"color",data=.x$data)),
         p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
         across(models:p_cont, stats::setNames,  .$cut)) -> diamond_result
> diamond_result$jt
[[1]]
 model term df1 df2 F.ratio p.value
 x            1 Inf 611.626 <.0001 
 y            1 Inf   2.914 0.0878 
 z            1 Inf 100.457 <.0001 
 clarity      7 Inf 800.852 <.0001 
 color        6 Inf 256.796 <.0001 

推荐答案

您需要在 map 步骤中添加 setNames :

library(tidyverse)

warpbreaks %>%
  group_by(tension) %>% 
  nest() %>%
  ungroup %>%
  mutate(models=map(data,~glm(breaks~wool,data=.x)),
        jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
        means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
        p_cont = setNames(map(means, 
                  ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),.$tension))

如果要命名所有列表输出,请使用 across :

If you want to name all the list output use across :

warpbreaks %>%
  group_by(tension) %>% 
  nest() %>%
  ungroup %>%
  mutate(models=map(data,~glm(breaks~wool,data=.x)),
         jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
         means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
         p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
         across(models:p_cont, setNames,  .$tension)) -> result

result$jt

#$L
# model term df1 df2 F.ratio p.value
# wool         1 Inf   5.653 0.0174 


#$M
# model term df1 df2 F.ratio p.value
# wool         1 Inf   1.253 0.2630 


#$H
# model term df1 df2 F.ratio p.value
# wool         1 Inf   2.321 0.1277 

这篇关于`group_by`并将分组级别保留为嵌套数据框的名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆