使用R dplyr/purrr按组获取卡方输出矩阵 [英] Use R dplyr/purrr To Get Chi-square Output Matrices By Group

查看:42
本文介绍了使用R dplyr/purrr按组获取卡方输出矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用tidyverse的元素按组获取卡方输出矩阵(例如,标准化残差,期望值).使用mtcars数据集,这是我开始的地方:

I'd like to get chi-square output matrices (e.g., standardized residuals, expected values) by group using elements of the tidyverse. Using the mtcars data set, here's where I've started:

mtcars %>% 
  dplyr::select(vs, am) %>%
  table() %>%
  chisq.test(.) 

产生卡方检验统计量.例如,为了获得标准化残差,我唯一成功的代码是:

Which produces the chi-square test statistic. In order to get standardized residuals, for example, my only successful code is this:

mtcars %>% 
  dplyr::select(vs, am) %>%
  table() %>%
  chisq.test(.) -> chi.out

chi.out$stdres

     vs am       Freq
1  0  0  0.9523038
2  1  0 -0.9523038
3  0  1 -0.9523038
4  1  1  0.9523038

理想情况下,我想将观测值和标准化残差转换为数据框格式.像这样:

Ideally, I'd like to get the observed values and the standardized residuals into a dataframe format. Something like this:

cbind(as.data.frame(chi.out$observed),as.data.frame(chi.out$stdres))

  vs am Freq vs am       Freq
1  0  0   12  0  0  0.9523038
2  1  0    7  1  0 -0.9523038
3  0  1    6  0  1 -0.9523038
4  1  1    7  1  1  0.9523038

最后,我想按组进行此操作,例如在mtcars数据集中的cyl列上.似乎dplyr和一些带有map_dfr或map_dfc的purrr地图的某些版本可以解决问题,但我不能完全解决.预先感谢.

Finally, I'd like to do this by group, for example over the cyl column in the mtcars data set. Seems dplyr and some some version of purrr's map with map_dfr or map_dfc would do the trick but I can't quite pull it together. Thanks in advance.

推荐答案

这是我对解决方案的建议.

So this is my proposal for a solution.

library(dplyr)
library(reshape2)

mtcars %>% 
  select(vs, am, cyl) %>%
  table() %>%
  apply(3, chisq.test) %>%
  lapply(`[`, c(6,9)) %>%
  melt() %>%
  spread(key = L2, value = value) %>%
  rename(cyl = L1) %>%
  select(cyl, vs, am, observed, stdres) %>%
  arrange(cyl)


   cyl vs am observed     stdres
1    4  0  0        0 -0.6422616
2    4  0  1        1  0.6422616
3    4  1  0        3  0.6422616
4    4  1  1        7 -0.6422616
5    6  0  0        0 -2.6457513
6    6  0  1        3  2.6457513
7    6  1  0        4  2.6457513
8    6  1  1        0 -2.6457513
9    8  0  0       12        NaN
10   8  0  1        2        NaN
11   8  1  0        0        NaN
12   8  1  1        0        NaN

此操作对每组 cyl 进行卡方检验.分组是在 select()语句中隐式完成的.最后,您可以获得 cyl vs am 的每种组合的观测值和标准化残差.应该适用于任何数据框.

This does a chi-square test for each group of cyl. The grouping is done implicitly in the select() statement. In the end you get the observed values and standardized residuals for every combination of cyl, vs, am. Should be applicable to any dataframe.

希望这就是您想要的.

这篇关于使用R dplyr/purrr按组获取卡方输出矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆