对所有变量运行svymean [英] Run svymean on all variables

查看:56
本文介绍了对所有变量运行svymean的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

------短篇小说--------

------ Short story--------

我想对数据集中的所有变量运行svymean(假设它们都是数字的).我从这里的指南中删除了此叙述: https://stylizeddata.com/how-to-use-survey-weights-in-r/

I would like to run svymean on all variables in the dataset (assuming they are all numeric). I've pulled this narrative from this guide over here: https://stylizeddata.com/how-to-use-survey-weights-in-r/

我知道我可以这样列出所有变量,从而对所有变量运行svymean:

I know I can run svymean on all the variables by listing them out like this:

svymean(~age+gender, ageDesign, na.rm = TRUE)

但是,我的实际数据集的长度为500个变量(它们都是数字),因此我需要一次高效地获取所有均值.我尝试了以下操作,但不起作用.

However, my real dataset is 500 variables long (they are all numeric), and I need to get the means all at once more efficiently. I tried the following but it does not work.

svymean(~., ageDesign, na.rm = TRUE)

有什么想法吗?

---------带有真实数据的详细解释-----

--------- Long explanation with real data-----

library(haven)
library(survey)
library(dplyr)
 

导入NHANES人口统计数据

Import NHANES demographic data

nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

复制并重命名变量,使它们更加直观."fpl"是百分之联邦贫困线.取值范围是0到5.

Copy and rename variables so they are more intuitive. "fpl" is percent of the of the federal poverty level. It ranges from 0 to 5.

nhanesDemo$fpl        <- nhanesDemo$INDFMPIR
 
nhanesDemo$age        <- nhanesDemo$RIDAGEYR
 
nhanesDemo$gender     <- nhanesDemo$RIAGENDR
 
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
 
nhanesDemo$psu        <- nhanesDemo$SDMVPSU
 
nhanesDemo$strata     <- nhanesDemo$SDMVSTRA

由于有47个变量,因此我们将仅选择将在其中使用的变量这种分析.

Since there are 47 variables, we will select only the variables we will use in this analysis.

nhanesAnalysis <- nhanesDemo %>%
                    select(fpl,
                           age,
                           gender,
                           persWeight,
                           psu,
                           strata)
 

调查权重

在这里,我们使用"svydesign"分配权重.我们将使用这个新设计变量"nhanesDesign"在进行分析时.

Here we use "svydesign" to assign the weights. We will use this new design variable "nhanesDesign" when running our analyses.

nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)

此处,我们使用子集"告诉"nhanesDesign"我们只想看一个特定的亚人群(即18-79岁之间的人群).这是重要的事情.如果您不这样做,只是以不同的方式进行限制您的估算值将没有正确的SE.

Here we use "subset" to tell "nhanesDesign" that we want to only look at a specific subpopulation (i.e., those age between 18-79 years). This is important to do. If you don't do this and just restrict it in a different way your estimates won't have correct SEs.

ageDesign <- subset(nhanesDesign, age > 17 &
                                  age < 80)

统计

我们将使用"svymean"计算年龄的人口均值.航海参数"TRUE";从计算中排除缺失值.我们看到平均年龄为45.648,标准误为0.5131.

We will use "svymean" to calculate the population mean for age. The na.rm argument "TRUE" excludes missing values from the calculation. We see that the mean age is 45.648 and the standard error is 0.5131.

svymean(~age, ageDesign, na.rm = TRUE)

我知道我可以通过列出所有变量来对所有变量运行svymean:svymean(〜年龄+性别,ageDesign,na.rm = TRUE)但是,我的实际数据集的长度为500个变量,因此我需要一次更有效地获取所有均值.我尝试了以下操作,但不起作用.svymean(〜.,ageDesign,na.rm = TRUE)

推荐答案

您可以使用 reformulate 动态构造公式.

You can use reformulate to construct the formula dynamically.

library(survey)
svymean(reformulate(names(nhanesAnalysis)), ageDesign, na.rm = TRUE)

#                 mean        SE
#fpl            3.0134    0.1036
#age           45.4919    0.5273
#gender         1.5153    0.0065
#persWeight 80773.3847 5049.1504
#psu            1.5102    0.1330
#strata       126.1877    0.1506

与在函数中分别指定每一列的输出相同.

This gives the same output as specifying each column individually in the function.

svymean(~age + fpl + gender + persWeight + psu + strata, ageDesign, na.rm = TRUE)

这篇关于对所有变量运行svymean的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆