R跨多个列的多项选择调查问题的频率表 [英] R Frequency table for multiselect survey question across several columns
问题描述
我想对R中的调查问题做一个相当普遍的分析,但陷入了中间.
I want to do a fairly common analysis of survey questions in R, but am stuck in the middle.
想象一个调查,要求您回答与某些功能相关联的品牌(例如,品牌"可以是PlayStation,XBox ...,功能可以是速度",图形" ...其中每个品牌可以通过多种功能(也可以选择多种功能)进行检查.例如.某物在这里是这样的: https://www. Harvestyourdata.com/fileadmin/images/question-type-screenshots/Grid-multi-select.jpg
Imagine a survey where you are asked to answer which brands do you associate with certain features (e.g. "brands" could be PlayStation, XBox..., and features could be "speed", "graphics"... where each brand can be checked on several features aka mulit-select). E.g. sth. like this here: https://www.harvestyourdata.com/fileadmin/images/question-type-screenshots/Grid-multi-select.jpg
您通常将这些问题称为多选网格或矩阵问题.
You often refer to these questions as multi-select grid or matrix questions.
无论如何,从数据角度来看,此类数据通常以宽格式存储,其中每一行*列组合是一个变量,其编码为0/1(如果调查参与者未选中该框,则为0,否则为1)否则).
Anyway, from a data perspective, this kind of data is usually stored in wide format where each row*column combination is one variable, which is 0/1 coded (0 if the survey participant doesn't check the box, 1 otherwise).
假设我们有5个品牌和10个商品,则总共会有50个变量,理想情况下,遵循一个很好的结构化命名方案,例如item1_column1,item2_column1,item3_column1,[...],item1_column2等.
Assuming we have 5 brands and 10 items, we would have 50 variables in total, ideally following a nice, structured naming scheme, e.g. item1_column1, item2_column1, item3_column1, [...], item1_column2 and so on.
现在,我想一次迭代分析(频率表)所有这些变量.我已经在 questionr 包中找到了 cross.multi.table 函数.但是,它仅允许基于单个因素分析所有项目.相反,我需要同时允许多列.
Now, I want to analyze (frequency table) all of these variables in one iteration. I've already found the cross.multi.table function in the questionr package. However, it only allows to analyze all items based on on single factor. What I need instead is to allow for several columns at the same time.
有什么想法吗?我是否可能缺少另一个软件包中的函数,或者可以使用tidyverse甚至使用cross.multi.table函数轻松完成此操作?
Any ideas? MIght be I'm missing a function from another package or this can easily be done with tidyverse or even with the cross.multi.table function?
使用此数据作为测试输入:
Using this data as test input:
dat = data.frame(item1_column1 = c(0,1,1,1),
item2_column1 = c(1,1,1,0),
item3_column1 = c(0,0,1,1),
item1_column2 = c(1,1,1,0),
item2_column2 = c(0,1,1,1),
item3_column2 = c(1,0,1,1),
item1_column3 = c(0,1,1,0),
item2_column3 = c(1,1,1,1),
item3_column3 = c(0,0,1,0))
我希望得到这样的输出:
I'd expect this output:
column1 column2 column3
item1 3 3 2
item2 3 3 4
item3 2 3 1
或理想的比例/百分比:
or ideally as proportions/percentages:
column1 column2 column3
item1 75% 75% 50%
item2 75% 75% 100%
item3 50% 75% 25%
推荐答案
一种方法是使用基于_
,group_by
item
和column
并计算value
列与spread
数据与宽格式的比率.
One way could be to get data into long format using gather
, separate
columns based on _
, group_by
item
and column
and calculate the ratio of value
column and spread
the data to wide format.
library(dplyr)
library(tidyr)
dat %>%
gather(key, value) %>%
separate(key, into = c("item", "column"), sep = "_") %>%
group_by(item, column) %>%
summarise(prop = mean(value) * 100) %>%
spread(column, prop)
# item column1 column2 column3
# <chr> <dbl> <dbl> <dbl>
#1 item1 75 75 50
#2 item2 75 75 100
#3 item3 50 75 25
短一点(感谢@ M-M)
A bit shorter (Thanks to @M-M)
dat %>%
summarise_all(~mean(.) * 100) %>%
gather(key, value) %>%
separate(key, into = c("item", "column"), sep = "_") %>%
spread(column, value)
这篇关于R跨多个列的多项选择调查问题的频率表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!