使用tbl_summary创建带有标签的摘要统计信息 [英] Using tbl_summary to create summary statistics with labels
问题描述
我已将Stata(dta)文件读入R,数据片段如下所示:
I have read in a Stata (dta) file into R and a snippet of the data looks like this:
short
# A tibble: 200 x 5
q4_1 q4_2 q4_3 q4_4 treatment_cur
<dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <chr>
1 NA(z) NA(z) NA(z) NA(z) Control
2 NA(z) NA(z) NA(z) NA(z) Control
3 1 [1.Yes] 0 [0.No] 0 [0.No] 1 [1.Yes] Treatment
4 0 [0.No] 0 [0.No] 1 [1.Yes] 0 [0.No] Control
5 0 [0.No] 0 [0.No] 0 [0.No] 1 [1.Yes] Control
6 NA(z) NA(z) NA(z) NA(z) Control
7 1 [1.Yes] 1 [1.Yes] 1 [1.Yes] 1 [1.Yes] Control
8 NA(z) NA(z) NA(z) NA(z) Treatment
9 NA(z) NA(z) NA(z) NA(z) Control
10 0 [0.No] 0 [0.No] 1 [1.Yes] 0 [0.No] Control
变量的格式如下:
str(short)
tibble [200 x 5] (S3: tbl_df/tbl/data.frame)
$ q4_1 : dbl+lbl [1:200] NA(z), NA(z), 1, 0, 0, NA(z), 1, NA(z), NA(z), 0, NA(z), 1, NA(z), 1, NA(z), 1, ...
..@ label : chr "q4_1r.Do you have any of ...assignments? Bilingual/ELL"
..@ format.stata: chr "%15.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_2 : dbl+lbl [1:200] NA(z), NA(z), 0, 0, 0, NA(z), 1, NA(z), NA(z), 0, NA(z), 0, NA(z), 0, NA(z), 0, ...
..@ label : chr "q4_2r.Do you have any of ...assignments? Sp Ed (self-c)"
..@ format.stata: chr "%34.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_3 : dbl+lbl [1:200] NA(z), NA(z), 0, 1, 0, NA(z), 1, NA(z), NA(z), 1, NA(z), 1, NA(z), 1, NA(z), 0, ...
..@ label : chr "q4_3r.Do you have any of ...assignments? Sp Ed (incl.)"
..@ format.stata: chr "%72.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ q4_4 : dbl+lbl [1:200] NA(z), NA(z), 1, 0, 1, NA(z), 1, NA(z), NA(z), 0, NA(z), 1, NA(z), 0, NA(z), 0, ...
..@ label : chr "q4_4r.Do you have any of ...assignments? Gifted/Talented"
..@ format.stata: chr "%17.0g"
..@ labels : Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "0.No" "1.Yes"
$ treatment_cur: chr [1:200] "Control" "Control" "Treatment" "Control" ...
..- attr(*, "label")= chr "treatment_cur.treatment_cur"
..- attr(*, "format.stata")= chr "%9s"
这是每个变量的类:
> class(short$q4_1)
[1] "haven_labelled" "vctrs_vctr" "double"
我需要使用来自library(gtsummary)的tbl_summary创建数据的描述性表格,这是一个非常酷的程序包,用于创建数据的快速且可自定义的摘要统计信息.
I need to create descriptive tabulations of the data using tbl_summary from library(gtsummary)--which is a really cool package to create quick and customizable summary stats of the data.
关于我的数据的一件很酷的事情是,每个值都已经有一个与之关联的标签.例如,在q4_2中,0是否".并且1是是".这样,当我将数据输入tbl_summary时,而不是显示在频率计数中:
The cool thing about my data is that each value already has a label associated with it. For example in q4_2, 0 is "No" and 1 is "Yes". So that when I feed the data into tbl_summary, instead of this showing up in the freq count:
q4_1 n
1 7
0 8
这可以显示出来,这就是我想要的:
"q4_1r.Do you have any of ...assignments? Bilingual/ELL"
n
No 7
Yes 8
此代码不起作用,因为tbl_summary仅接受某些格式.
This code is not working because tbl_summary only accepts certain formats.
tbl_summary(short)
Column(s) ‘q4_1’, ‘q4_2’, ‘q4_3’, and ‘q4_4’ omitted from output.
Accepted classes are ‘character’, ‘factor’, ‘numeric’, ‘logical’, ‘integer’, or ‘difftime’.
如果我将这些变量转换为字符,它们将丢失其值标签,并且我只会看到以下内容,因为将其转换为字符会使变量失去其标签属性.
If I convert these variables into characters, they lose their value labels, and I only see the following, because converting it to a character makes the variable lose its label attributes.
q4_1 n
1 7
0 8
关于如何解决这个问题有什么想法吗?我找不到具有这种var格式的内置R文件,以使其更具可复制性.
Are there any idea's for how I can work around this? I can't find an inbuilt R file that has this type of var format to make this more reproducible.
推荐答案
对于标记为避风港的类,它绝不是要用于分析或数据探索的类.相反,它是从其他语言(其中数据类型与R没有一对一的关系)导入数据时介于两者之间创建的.这是从tidyverse博客文章中有关标记为变量的类的避风港的.( https://haven.tidyverse.org/articles/semantics.html )
In the the case of the haven labelled class, it was never meant to be a class that was used in analysis or data exploration. Rather, it was created as an in-between when importing data from other languages where the data types don't have a one-to-one relationship with R. This is from a tidyverse blog post about the haven labelled class of variables. (https://haven.tidyverse.org/articles/semantics.html)
避风港的目的不是提供可以在分析中任何地方使用的标记载体.目的是提供一个中间数据结构,您可以将其转换为常规R数据帧.
The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate data structure that you can convert into a regular R data frame.
要使用 tbl_summary()
,您首先要在导入的数据框中应用 as_factor()
函数,例如 haven :: as_factor(short)
.这会将您的数据框转换为基本R类型,并保留Stata值标签作为因子.
To use tbl_summary()
you'll first want to apply the as_factor()
function on your imported data frame, e.g. haven::as_factor(short)
. This will convert your data frame to base R types, and retain the Stata value labels as factors.
仅供参考,我们正在使 tbl_summary()
与所有类型兼容,并且在软件包的下一发行版中,将不需要 as_factor()
步骤.您可以在此处跟踪实现的进度: https://github.com/ddsjoberg/gtsummary/拉/603
FYI, we are making tbl_summary()
compatible with all types, and in the next release of the package the as_factor()
step will not be required. You can follow the progress of the implementation here: https://github.com/ddsjoberg/gtsummary/pull/603
这篇关于使用tbl_summary创建带有标签的摘要统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!