使用R dplyr/purrr按组获取卡方输出矩阵 [英] Use R dplyr/purrr To Get Chi-square Output Matrices By Group
问题描述
我想使用tidyverse的元素按组获取卡方输出矩阵(例如,标准化残差,期望值).使用mtcars数据集,这是我开始的地方:
I'd like to get chi-square output matrices (e.g., standardized residuals, expected values) by group using elements of the tidyverse. Using the mtcars data set, here's where I've started:
mtcars %>%
dplyr::select(vs, am) %>%
table() %>%
chisq.test(.)
产生卡方检验统计量.例如,为了获得标准化残差,我唯一成功的代码是:
Which produces the chi-square test statistic. In order to get standardized residuals, for example, my only successful code is this:
mtcars %>%
dplyr::select(vs, am) %>%
table() %>%
chisq.test(.) -> chi.out
chi.out$stdres
vs am Freq
1 0 0 0.9523038
2 1 0 -0.9523038
3 0 1 -0.9523038
4 1 1 0.9523038
理想情况下,我想将观测值和标准化残差转换为数据框格式.像这样:
Ideally, I'd like to get the observed values and the standardized residuals into a dataframe format. Something like this:
cbind(as.data.frame(chi.out$observed),as.data.frame(chi.out$stdres))
vs am Freq vs am Freq
1 0 0 12 0 0 0.9523038
2 1 0 7 1 0 -0.9523038
3 0 1 6 0 1 -0.9523038
4 1 1 7 1 1 0.9523038
最后,我想按组进行此操作,例如在mtcars数据集中的cyl列上.似乎dplyr和一些带有map_dfr或map_dfc的purrr地图的某些版本可以解决问题,但我不能完全解决.预先感谢.
Finally, I'd like to do this by group, for example over the cyl column in the mtcars data set. Seems dplyr and some some version of purrr's map with map_dfr or map_dfc would do the trick but I can't quite pull it together. Thanks in advance.
推荐答案
这是我对解决方案的建议.
So this is my proposal for a solution.
library(dplyr)
library(reshape2)
mtcars %>%
select(vs, am, cyl) %>%
table() %>%
apply(3, chisq.test) %>%
lapply(`[`, c(6,9)) %>%
melt() %>%
spread(key = L2, value = value) %>%
rename(cyl = L1) %>%
select(cyl, vs, am, observed, stdres) %>%
arrange(cyl)
cyl vs am observed stdres
1 4 0 0 0 -0.6422616
2 4 0 1 1 0.6422616
3 4 1 0 3 0.6422616
4 4 1 1 7 -0.6422616
5 6 0 0 0 -2.6457513
6 6 0 1 3 2.6457513
7 6 1 0 4 2.6457513
8 6 1 1 0 -2.6457513
9 8 0 0 12 NaN
10 8 0 1 2 NaN
11 8 1 0 0 NaN
12 8 1 1 0 NaN
此操作对每组 cyl
进行卡方检验.分组是在 select()
语句中隐式完成的.最后,您可以获得 cyl
, vs
, am
的每种组合的观测值和标准化残差.应该适用于任何数据框.
This does a chi-square test for each group of cyl
. The grouping is done implicitly in the select()
statement. In the end you get the observed values and standardized residuals for every combination of cyl
, vs
, am
. Should be applicable to any dataframe.
希望这就是您想要的.
这篇关于使用R dplyr/purrr按组获取卡方输出矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!