如何使用R调查工具包分析加权样本中的多个回答问题? [英] How to use the R survey package to analyze multiple response questions in a weighted sample?

查看:328
本文介绍了如何使用R调查工具包分析加权样本中的多个回答问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R还是比较陌生.我想知道如何使用'survey'软件包( http://r-survey.r-forge.r-project.org/survey/)来分析加权样本的多重回答问题?棘手的一点是,可以勾选多个响应,因此响应存储在几列中.

I'm relatively new to R. I am wondering how to use the 'survey' package (http://r-survey.r-forge.r-project.org/survey/) to analyze a multiple response question for a weighted sample? The tricky bit is that more than one response can be ticked so the responses are stored across several columns.

我有来自10个地区的500名受访者的调查数据.假设所问的主要问题是(存储在H1_AreYouHappy列中):您高兴吗?" -是/否/不知道

I have survey data from 500 respondents who were drawn randomly from across 10 districts. Let's say the main question that was asked was (stored in column H1_AreYouHappy): 'Are you happy?' - Yes / No / Don't know

被调查者被问到一个后续问题:您为什么(不)高兴?" 这是一个多项选择题,可以在多个回答框中打勾,因此回答存储在单独的列中,例如:

The respondent is asked a follow-up question: 'WHY are you (un)happy?' This is a multiple choice question and more than one response box can be ticked, so responses are stored in separate columns, for example:

H1Yes_Why1(0/1,即在方框中打勾或未打勾)-'由于经济原因';

H1Yes_Why1 (0/1, i.e. box ticked or not ticked) - 'Because of the economny';

H1Yes_Why2(0/1)-因为我很健康";

H1Yes_Why2 (0/1) - 'Because I'm healthy';

H1Yes_Why3(0/1)-因为我的社交生活".

H1Yes_Why3 (0/1) - 'Because of my social life'.

districts <- c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender')
myDataFrame <- data.frame(H1_AreYouHappy=sample(c('Yes','No','Dont Know'),500,rep=TRUE), 
                          H1Yes_Why1 = sample(0:1,500,rep=TRUE), 
                          H1Yes_Why2 = sample(0:1,500,rep=TRUE), 
                          H1Yes_Why3 = sample(0:1,500,rep=TRUE), 
                          District = sample(districts,500,rep=TRUE), stringsAsFactors=TRUE)

我正在使用R'survey'程序包根据每个地区的实际人口规模来应用分层后权重

library(survey)
# Create an unweighted survey object
mySurvey.unweighted <- svydesign(ids=~1, data=myDataFrame)

# Choose which variable contains the sample distribution to be weighted by
sample.distribution <- list(~District)

# Specify (from Census data) how often each level occurs in the population
population.distribution <- data.frame(District = c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender'),
                              freq = c(0.1824885, 0.0891206, 0.1381343, 0.1006533, 0.1541269, 0.0955853, 0.0268172, 0.0398353, 0.0809459, 0.0922927))

# Apply the weights
mySurvey.rake <- rake(design = mySurvey.unweighted, sample.margins=sample.distribution, population.margins=list(population.distribution))

# Calculate the weighted mean for the main question
svymean(~H1_AreYouHappy, mySurvey.rake)

# How can I calculate the WEIGHTED means for the multiple choice - multiple response follow-up question?

如何为多项选择题(即在0/1响应列中)计算加权平均值?

如果我希望它不加权,则可以使用此函数来计算所有与我的前缀'H1Yes_Why'匹配的列的频率

How can I calculate the WEIGHTED means for the multiple choice question (i.e. across the 0/1 response columns)?

If I wanted it unweighted, I could just use this function which calculates the frequencies across all columns that match my prefix 'H1Yes_Why'

multipleResponseFrequencies = function(data, question.prefix) {
  # Find the columns with the questions
  a = grep(question.prefix, names(data))
  # Find the total number of responses
  b = sum(data[, a] != 0)
  # Find the totals for each question
  d = colSums(data[, a] != 0)
  # Find the number of respondents
  e = sum(rowSums(data[,a]) !=0)
  # d + b as a vector. This is the overfall frequency 
  f = as.numeric(c(d, b))
  result <- data.frame(question = c(names(d), "Total"),
                       freq = f,
                       percent = (f/b)*100,
                       percentofcases = (f/e)*100)
  result
}
multipleResponseFrequencies(myDataFrame, 'H1Yes_Why')

任何帮助将不胜感激.

推荐答案

我认为你想要

svyratio( ~ H1Yes_Why1 + H1Yes_Why2 + H1Yes_Why3 , ~ as.numeric( H1Yes_Why1 + H1Yes_Why2 + H1Yes_Why3 ) , mySurvey.rake)

这篇关于如何使用R调查工具包分析加权样本中的多个回答问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆