使用`dplyr :: group_by（）`为多个组获取`chisq.test（）$ p.value` [英] Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

查看：172 发布时间：2020/10/26 3:01:25 r dplyr chi-squared tidyverse

本文介绍了使用`dplyr :: group_by（）`为多个组获取`chisq.test（）$ p.value`的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在dplyr框架内的几个组上进行卡方检验。问题是 group_by（）％>％summarise（）似乎并没有成功。

I'm trying to conduct a chi square test on several groups within the dplyr frame. The problem is, group_by() %>% summarise() doesn't seem to do trick.

模拟数据（结构与问题数据相同，但随机，因此p.values应该很高）

Simulated data (same structure as problematic data, but random, so p.values should be high)

set.seed(1) data.frame(partido=sample(c("PRI", "PAN"), 100, 0.6), genero=sample(c("H", "M"), 100, 0.7), GM=sample(c("Bajo", "Muy bajo"), 100, 0.8)) -> foo

我想比较GM定义的几个组，以查看p.values是否有变化适用于GM的部分和通用交叉表。

I want to compare several groups defined by GM to see if there are changes in the p.values for the crosstab of partido and genero, conditional to GM.

显而易见的dplyr方式应该是：

The obvious dplyr way should be:

foo %>% group_by(GM) %>% summarise(pvalue=chisq.test(.$partido, .$genero)$p.value) #just the p.value, so summarise is happy

但是我得到的是未分组数据的p.value，只是时间，而不是p.value。每个表：

But I get the p.values for the ungrouped data, just to times, not the p.value for each table:

＃小标题：2×2 GM pvalue < fctr> < dbl> 1 Bajo 0.8660521 2 Muy bajo 0.8660521

使用过滤器测试每个组，我得到：

Testing each group using filter I get:

foo %>% filter(GM=="Bajo") %$% table(partido, genero) %>% chisq.test()

返回值： X平方= 0.015655，df = 1，p值= 0.9004

foo %>% filter(GM=="Muy bajo") %$% table(partido, genero) %>% chisq.test()

返回值： X平方= 0.50409，df = 1，p值= 0.4777

dplyr：summarise（）与带有多个参数的函数一起使用，所以这不应该是问题：

dplyr:summarise() works with functions with more than one argument, so this shouldn't be the problem:

data.frame(a=1:10, b=10:1, c=sample(c("Grupo 1", "Grupo 2"), 10, 0.5)) %>% group_by(c) %>% summarise(r=cor(a, b))

就像魅力一样工作。它似乎不适用于chisq.test。

works like charm. It just doesn't seem to work with chisq.test.

我设法使用 tidyr :: nest（）和 purrr :: map（），但是我发现代码很麻烦-至少对我的学生而言。实际上，我已经投入了很多我们的教学来教他们（数学和编程方面非常有挑战性的小组）dplyr，以便他们可以尽可能地避免使用向量函数。

I managed to get what I wanted with nested models using tidyr::nest() and purrr::map(), but I find the code cumbersome --at least for my students. Actually, I´ve invested many ours teaching them (a very math and programming challenged group) dplyr so they could avoid vector functions as much as possible.

foo %>% nest(-GM) %>% mutate(tabla=map(data, ~table(.))) %>% mutate(pvalue=map(tabla, ~chisq.test(.)$p.value)) %>% select(GM, pvalue) %>% unnest() A tibble: 2 × 2 GM pvalue <fctr> <dbl> 1 Bajo 0.9004276 2 Muy bajo 0.4777095

do（）也会做到这一点：

foo %>% group_by(GM) %>% do(tidy(chisq.test(.$partido, .$genero))) Source: local data frame [2 x 5] Groups: GM [2] GM statistic p.value parameter <fctr> <dbl> <dbl> <int> 1 Bajo 0.0156553 0.9004276 1 2 Muy bajo 0.5040878 0.4777095 1 # ... with 1 more variables: method <fctr>

例如：费舍尔和皮尔逊的独立性测试

但是，¿为什么不 group_by（）与 summarise（chisq.test（）$ p.value）一起使用吗？

But, ¿why doesn't group_by() work with summarise(chisq.test()$p.value)?

推荐答案

在 dplyr 中，通常只能使用未加引号的变量名来访问相关列，无论您是在groupby还是其他情况下。因此，从。$ partido 和。$ genero 。$ 访问器$ c>我不需要的：

In dplyr you can generally just use unquoted variable names to access the relevant columns, whether you're in a groupby or otherwise. So removing the .$ accessors from .$partido and .$genero which are not needed I get:

foo %>% group_by(GM) %>% summarise(pvalue= chisq.test(partido, genero)$p.value) # A tibble: 2 × 2 GM pvalue <fctr> <dbl> 1 Bajo 0.9004276 2 Muy bajo 0.4777095

这篇关于使用`dplyr :: group_by（）`为多个组获取`chisq.test（）$ p.value`的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用`dplyr :: group_by（）`为多个组获取`chisq.test（）$ p.value` [英] Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用`dplyr :: group_by（）`为多个组获取`chisq.test（）$ p.value` [英] Get `chisq.test()$p.value` for several groups using `dplyr::group_by()`

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭