对所有变量运行svymean [英] Run svymean on all variables
问题描述
------短篇小说--------
------ Short story--------
我想对数据集中的所有变量运行svymean(假设它们都是数字的).我从这里的指南中删除了此叙述: https://stylizeddata.com/how-to-use-survey-weights-in-r/
I would like to run svymean on all variables in the dataset (assuming they are all numeric). I've pulled this narrative from this guide over here: https://stylizeddata.com/how-to-use-survey-weights-in-r/
我知道我可以这样列出所有变量,从而对所有变量运行svymean:
I know I can run svymean on all the variables by listing them out like this:
svymean(~age+gender, ageDesign, na.rm = TRUE)
但是,我的实际数据集的长度为500个变量(它们都是数字),因此我需要一次高效地获取所有均值.我尝试了以下操作,但不起作用.
However, my real dataset is 500 variables long (they are all numeric), and I need to get the means all at once more efficiently. I tried the following but it does not work.
svymean(~., ageDesign, na.rm = TRUE)
有什么想法吗?
---------带有真实数据的详细解释-----
--------- Long explanation with real data-----
library(haven)
library(survey)
library(dplyr)
导入NHANES人口统计数据
Import NHANES demographic data
nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))
复制并重命名变量,使它们更加直观."fpl"是百分之联邦贫困线.取值范围是0到5.
Copy and rename variables so they are more intuitive. "fpl" is percent of the of the federal poverty level. It ranges from 0 to 5.
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA
由于有47个变量,因此我们将仅选择将在其中使用的变量这种分析.
Since there are 47 variables, we will select only the variables we will use in this analysis.
nhanesAnalysis <- nhanesDemo %>%
select(fpl,
age,
gender,
persWeight,
psu,
strata)
调查权重
在这里,我们使用"svydesign"分配权重.我们将使用这个新设计变量"nhanesDesign"在进行分析时.
Here we use "svydesign" to assign the weights. We will use this new design variable "nhanesDesign" when running our analyses.
nhanesDesign <- svydesign(id = ~psu,
strata = ~strata,
weights = ~persWeight,
nest = TRUE,
data = nhanesAnalysis)
此处,我们使用子集"告诉"nhanesDesign"我们只想看一个特定的亚人群(即18-79岁之间的人群).这是重要的事情.如果您不这样做,只是以不同的方式进行限制您的估算值将没有正确的SE.
Here we use "subset" to tell "nhanesDesign" that we want to only look at a specific subpopulation (i.e., those age between 18-79 years). This is important to do. If you don't do this and just restrict it in a different way your estimates won't have correct SEs.
ageDesign <- subset(nhanesDesign, age > 17 &
age < 80)
统计
我们将使用"svymean"计算年龄的人口均值.航海参数"TRUE";从计算中排除缺失值.我们看到平均年龄为45.648,标准误为0.5131.
We will use "svymean" to calculate the population mean for age. The na.rm argument "TRUE" excludes missing values from the calculation. We see that the mean age is 45.648 and the standard error is 0.5131.
svymean(~age, ageDesign, na.rm = TRUE)
我知道我可以通过列出所有变量来对所有变量运行svymean:svymean(〜年龄+性别,ageDesign,na.rm = TRUE)但是,我的实际数据集的长度为500个变量,因此我需要一次更有效地获取所有均值.我尝试了以下操作,但不起作用.svymean(〜.,ageDesign,na.rm = TRUE)
推荐答案
您可以使用 reformulate
动态构造公式.
You can use reformulate
to construct the formula dynamically.
library(survey)
svymean(reformulate(names(nhanesAnalysis)), ageDesign, na.rm = TRUE)
# mean SE
#fpl 3.0134 0.1036
#age 45.4919 0.5273
#gender 1.5153 0.0065
#persWeight 80773.3847 5049.1504
#psu 1.5102 0.1330
#strata 126.1877 0.1506
与在函数中分别指定每一列的输出相同.
This gives the same output as specifying each column individually in the function.
svymean(~age + fpl + gender + persWeight + psu + strata, ageDesign, na.rm = TRUE)
这篇关于对所有变量运行svymean的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!