调查包的F参数未提供预期的输出 [英] F argument of survey package does not give expected output

查看:102
本文介绍了调查包的F参数未提供预期的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关注 R的调查数据包插值处理中位数估算值,但并没有引起很多反馈.我设法将问题归结为以下内容:

Follow up on R's survey package interpolation handling for median estimates, which has not attracted many feedback. I have managed to boil down the issue to the following:

我正在使用R的survey包来获取一组数据的中值估计.用于复制此问题的数据可以作为dput文本此处.

I'm using R's survey package to get the median estimation for a set of data. The data to replicate this issue is available as a dput text here.

我正在使用的设计是定义为以下内容的类svyrep.design:

The design I'm using is a class svyrep.design defined as the following:

design <- svydesign(id = ~id_directorio, strata = ~estrato, weights = ~f_pers, check.strata = TRUE, data = datos)
set.seed(234262762)
repdesign <- as.svrepdesign(design, type = "subbootstrap", replicates=20)
options(survey.lonely.psu="adjust")

svyby内的svyquantile可以按预期完成工作:

A svyquantile inside a svyby does the job as expected:

svyby(formula = ~ing_t_p, by = ~CL_GRUPO_OCU_08, repdesign, svyquantile, quantiles=c(0.5),  method="constant", 
      f = 0.5, ties = "rounded", vartype=c("ci", "se"), ci=TRUE, na.rm=FALSE)

         CL_GRUPO_OCU_08         V1        se         cv        cv%
ISCO08_1        ISCO08_1 1002513.04 269630.31 0.26895442  26.895442
ISCO08_2        ISCO08_2  744505.53  68827.09 0.09244672   9.244672
ISCO08_3        ISCO08_3  489789.32  42839.16 0.08746447   8.746447
ISCO08_4        ISCO08_4  449806.52  69526.34 0.15456944  15.456944
ISCO08_5        ISCO08_5  286705.37  13392.01 0.04671002   4.671002
ISCO08_6        ISCO08_6  449613.04       NaN        NaN        NaN
ISCO08_7        ISCO08_7   93032.83 109534.62 1.17737600 117.737600
ISCO08_8        ISCO08_8  564514.15 437752.31 0.77544967  77.544967
ISCO08_9        ISCO08_9  293712.84  24497.97 0.08340790   8.340790

但是,请参见类别ISCO08_6的估算值.它没有给出预期的中位数结果.而是显示两者中最小的数目:

However, see the estimation for category ISCO08_6. Its not giving the expected median result. Instead, is showing the smallest number of the two:

datos %>% filter(CL_GRUPO_OCU_08 == "ISCO08_6")

# A tibble: 2 x 5
  id_directorio estrato f_pers ing_t_p CL_GRUPO_OCU_08
          <dbl>   <dbl>  <dbl>   <dbl> <chr>          
1         24568    2021   98.7 449613. ISCO08_6       
2         24568    2021   98.7 551525. ISCO08_6    

f参数应处理此问题(它管理数据插值);确实适用于所有其他情况,但对ISCO08_6行没有影响.我发现此问题会影响只有2个或4个数据点的估计.

The f argument should deal with this (it manages data interpolation); and indeed it does for all the other cases, but it does not have an effect on the ISCO08_6 row. I have found that this issue affects estimations where there are only 2 or 4 data points.

那么,当数据点数量很少时,如何使用这种方法获得中位数结果?

So how do I get the median result using this method when the number of datapoints are small?

推荐答案

好吧,看来您需要非常微小大于0.5的分位数才能获得所需的东西.我将研究这是否是错误,还是是否有必要与SUDAAN之类的其他系统达成协议.我将为下一个版本修复或记录此问题(或者可能添加另一个选项).分位数最差.

Ok, it looks as though you need to ask for a quantile very slightly larger than 0.5 to get what you want. I will look into whether this is a bug or whether it was necessary to get agreement with some other system like SUDAAN. I will either fix or document this for the next version (or perhaps add yet another option). Quantiles are the worst.

以下是仅使用svyquantile()

> svyquantile(~ing_t_p, quantile=0.5000001, design=dd, f=0.5, ies="rounded", method="constant")
             0.5
ing_t_p 500569.2
> svyquantile(~ing_t_p, quantile=0.5000001, design=dd, f=0, ties="rounded", method="constant")
           0.5
ing_t_p 449613
> svyquantile(~ing_t_p, quantile=0.5000001, design=dd, f=1, ties="rounded", method="constant")
             0.5
ing_t_p 551525.3

在这里使用svyby().请注意,必须在第一个参数中使用formula=,否则R将f=0.5参数解释为formula=0.5

And here using svyby(). Note that you have to use formula= in the first argument, otherwise the f=0.5 argument is interpreted by R as formula=0.5

> svyby(formula=~ing_t_p, by = ~CL_GRUPO_OCU_08, design, svyquantile, quantiles=c(0.5000001),f=0.5, method="constant", vartype=c("ci", "se"), ci=TRUE, na.rm.all=FALSE)
         CL_GRUPO_OCU_08    ing_t_p        se      ci_l      ci_u
ISCO08_1        ISCO08_1 1002513.04 254418.31 550769.11 1629454.6
ISCO08_2        ISCO08_2  749355.06  62294.16 649720.53  899613.0
ISCO08_3        ISCO08_3  489789.32  32140.54 409819.42  538808.8
ISCO08_4        ISCO08_4  449806.52  74549.55 349699.00  650000.0
ISCO08_5        ISCO08_5  286705.37  15349.64 240706.43  301766.1
ISCO08_6        ISCO08_6  500569.18       NaN       NaN       NaN
ISCO08_7        ISCO08_7   93032.83 108653.60  55000.00  503500.0
ISCO08_8        ISCO08_8  564514.15 429428.77  80470.95 2061000.0
ISCO08_9        ISCO08_9  293712.84  18830.76 245000.00  320539.5
There were 12 warnings (use warnings() to see them)

这篇关于调查包的F参数未提供预期的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆