R跨多个列的多项选择调查问题的频率表 [英] R Frequency table for multiselect survey question across several columns

查看:91
本文介绍了R跨多个列的多项选择调查问题的频率表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对R中的调查问题做一个相当普遍的分析,但陷入了中间.

I want to do a fairly common analysis of survey questions in R, but am stuck in the middle.

想象一个调查,要求您回答与某些功能相关联的品牌(例如,品牌"可以是PlayStation,XBox ...,功能可以是速度",图形" ...其中每个品牌可以通过多种功能(也可以选择多种功能)进行检查.例如.某物在这里是这样的: https://www. Harvestyourdata.com/fileadmin/images/question-type-screenshots/Grid-multi-select.jpg

Imagine a survey where you are asked to answer which brands do you associate with certain features (e.g. "brands" could be PlayStation, XBox..., and features could be "speed", "graphics"... where each brand can be checked on several features aka mulit-select). E.g. sth. like this here: https://www.harvestyourdata.com/fileadmin/images/question-type-screenshots/Grid-multi-select.jpg

您通常将这些问题称为多选网格或矩阵问题.

You often refer to these questions as multi-select grid or matrix questions.

无论如何,从数据角度来看,此类数据通常以宽格式存储,其中每一行*列组合是一个变量,其编码为0/1(如果调查参与者未选中该框,则为0,否则为1)否则).

Anyway, from a data perspective, this kind of data is usually stored in wide format where each row*column combination is one variable, which is 0/1 coded (0 if the survey participant doesn't check the box, 1 otherwise).

假设我们有5个品牌和10个商品,则总共会有50个变量,理想情况下,遵循一个很好的结构化命名方案,例如item1_column1,item2_column1,item3_column1,[...],item1_column2等.

Assuming we have 5 brands and 10 items, we would have 50 variables in total, ideally following a nice, structured naming scheme, e.g. item1_column1, item2_column1, item3_column1, [...], item1_column2 and so on.

现在,我想一次迭代分析(频率表)所有这些变量.我已经在 questionr 包中找到了 cross.multi.table 函数.但是,它仅允许基于单个因素分析所有项目.相反,我需要同时允许多列.

Now, I want to analyze (frequency table) all of these variables in one iteration. I've already found the cross.multi.table function in the questionr package. However, it only allows to analyze all items based on on single factor. What I need instead is to allow for several columns at the same time.

有什么想法吗?我是否可能缺少另一个软件包中的函数,或者可以使用tidyverse甚至使用cross.multi.table函数轻松完成此操作?

Any ideas? MIght be I'm missing a function from another package or this can easily be done with tidyverse or even with the cross.multi.table function?

使用此数据作为测试输入:

Using this data as test input:

dat = data.frame(item1_column1 = c(0,1,1,1),
                 item2_column1 = c(1,1,1,0),
                 item3_column1 = c(0,0,1,1),
                 item1_column2 = c(1,1,1,0),
                 item2_column2 = c(0,1,1,1),
                 item3_column2 = c(1,0,1,1),
                 item1_column3 = c(0,1,1,0),
                 item2_column3 = c(1,1,1,1),
                 item3_column3 = c(0,0,1,0))

我希望得到这样的输出:

I'd expect this output:

         column1    column2    column3
item1    3          3          2
item2    3          3          4
item3    2          3          1

或理想的比例/百分比:

or ideally as proportions/percentages:

         column1    column2    column3
item1    75%        75%        50%
item2    75%        75%        100%
item3    50%        75%        25%

推荐答案

一种方法是使用基于_group_by itemcolumn并计算value列与spread数据与宽格式的比率.

One way could be to get data into long format using gather, separate columns based on _, group_by item and column and calculate the ratio of value column and spread the data to wide format.

library(dplyr)
library(tidyr)

dat %>%
  gather(key, value) %>%
  separate(key, into = c("item", "column"), sep = "_") %>%
  group_by(item, column) %>%
  summarise(prop = mean(value) * 100) %>%
  spread(column, prop)

#  item  column1 column2 column3
#  <chr>   <dbl>   <dbl>   <dbl>
#1 item1      75      75      50
#2 item2      75      75     100
#3 item3      50      75      25

短一点(感谢@ M-M)

A bit shorter (Thanks to @M-M)

dat %>%
  summarise_all(~mean(.) * 100) %>%
  gather(key, value) %>%
  separate(key, into = c("item", "column"), sep = "_") %>%
  spread(column, value)

这篇关于R跨多个列的多项选择调查问题的频率表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆