将各因子水平输出为观星汇总统计表中的虚拟变量 [英] Output each factor level as dummy variable in stargazer summary statistics table
问题描述
我正在使用 R 包 stargazer 创建高质量的回归表,我想用它来创建一个汇总统计表.我的数据中有一个因子变量,我希望汇总表显示因子在每个类别中的百分比——实际上,将因子分成一组互斥的逻辑(虚拟)变量,然后显示表中的那些.举个例子:
I'm using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my data, and I would like the summary table to show me the percent in each category of the factor -- in effect, separate the factor into a set of mutually exclusive logical (dummy) variables, and then display those in the table. Here's an example:
> library(car)
> library(stargazer)
> data(Blackmore)
> stargazer(Blackmore[, c("age", "exercise", "group")], type = "text")
==========================================
Statistic N Mean St. Dev. Min Max
------------------------------------------
age 945 11.442 2.766 8.000 17.920
exercise 945 2.531 3.495 0.000 29.960
------------------------------------------
但我试图获得一个额外的行,显示每组中的百分比(在这些数据中,% 控制和/或 % 患者).我确定这只是观星者某个地方的一个选项,但我找不到它.有人知道是什么吗?
But I'm trying to get an additional row that shows me the percent in each group (% control and/or % patient, in these data). I'm sure this is just an option somewhere in stargazer, but I can't find it. Does anyone know what it is?
car::Blackmoor
已将拼写更新为 car::Blackmore
.
推荐答案
由于 Stargazer 无法直接执行此操作,因此您可以创建自己的汇总表作为数据框并使用 pander、xtable 或任何其他包进行输出.例如,以下是如何使用 dplyr 和 tidyr 创建汇总表:
Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table:
library(dplyr)
library(tidyr)
fancy.summary <- Blackmoor %>%
select(-subject) %>% # Remove the subject column
group_by(group) %>% # Group by patient and control
summarise_each(funs(mean, sd, min, max, length)) %>% # Calculate summary statistics for each group
mutate(prop = age_length / sum(age_length)) %>% # Calculate proportion
gather(variable, value, -group, -prop) %>% # Convert to long
separate(variable, c("variable", "statistic")) %>% # Split variable column
mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
spread(statistic, value) %>% # Make the statistics be actual columns
select(group, variable, n, mean, sd, min, max, prop) # Reorder columns
如果你使用 pander 会导致这个:
Which results in this if you use pander:
library(pander)
pandoc.table(fancy.summary)
------------------------------------------------------
group variable n mean sd min max prop
------- ---------- --- ------ ----- ----- ----- ------
control age 359 11.26 2.698 8 17.92 0.3799
control exercise 359 1.641 1.813 0 11.54 0.3799
patient age 586 11.55 2.802 8 17.92 0.6201
patient exercise 586 3.076 4.113 0 29.96 0.6201
------------------------------------------------------
这篇关于将各因子水平输出为观星汇总统计表中的虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!