如何创建具有1个自变量和3个因变量的计数和百分比表及折线图 [英] How to create count and percentage tables and linegraphs with 1 independent variable and 3 dependent ones

查看:759
本文介绍了如何创建具有1个自变量和3个因变量的计数和百分比表及折线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 新手,从某种程度上来说,这个问题似乎很容易解决.但是不幸的是,经过大约三天的搜索和实验,我仍然无法做到这一点.

I'm an R neophyte, and somehow this problem seems like it should be trivial to solve. But unfortunately, I haven't been able to do so after about three days of searching and experimenting.

我的数据格式接近宽格式:

My data is in a form close to wideform:

color   agegroup    sex     ses
red     2           Female  A
blue    2           Female  C
green   5           Male    D
red     3           Female  A
red     2           Male    B
blue    1           Female  B
...

我正在尝试创建具有表现力的表,其中包含由sexsesagegroup组织的因变量(此处为color)的计数和百分比.对于每个agegroup,我需要一个由sessex组织的表,其计数紧随百分比,如下所示:

I'm trying to create presentable tables with counts and percentages of the dependent variable (color here) organized by sex, ses and agegroup. I need one table organized by ses and sex for each agegroup, with counts next to the percentages, like this:

agegroup:                                  1
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

agegroup:                               2
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

我一直在尝试使用datatablesexpssgmodels的所有内容,但是我只是不知道如何获得这样的输出. gmodels中的CrossTables最接近,但仍然相距很远-(1)将百分比放在计数之下,(2)我无法将其嵌套在selsex,(3)我无法弄清楚如何使它按代来分解结果,并且(4)输出中充满了破折号,竖线和空格,这使得将其放入文字处理器或电子表格中是一个错误容易发生的人间事.

I've been trying to do this with everything from datatables and expss to gmodels, but I just can't figure out how to get output like this. CrossTables from gmodels comes closest, but it's still pretty far away -- (1) it puts percentages under counts, (2) I can't get it to nest sel under sex, (3) I can't figure out how to get it to disgregate the results by generation, and (4) the output is full of dashes, vertical pipes and spaces which make putting it into a word processor or spreadsheet an error-prone manual affair.

我删除了我的第二个问题(关于线图),因为第一个问题的答案是完美的,值得称赞,即使它没有涉及第二个问题.我将一开始就单独询问第二个问题.

I removed my second question (about line plots), because the answer to the first question is perfect and deserves credit, even if it doesn't touch on the second one. I'll ask the second question separately, as I should have from the start.

推荐答案

expss软件包最接近的结果:

The closest result with expss package:

library(expss)
# generate example data
set.seed(123)
N = 300
df = data.frame(
    color = sample(c("red", "blue", "green"), size = N, replace = TRUE),
    agegroup = sample(1:5, size = N, replace = TRUE),
    sex = sample(c("Male", "Female"), size = N, replace = TRUE),
    ses = sample(c("A", "B", "C", "D"),  size = N, replace = TRUE),
    stringsAsFactors = FALSE
)

# redirect output to RStudio HTML viewer
expss_output_viewer()
res = df %>% 
    tab_cells("|" = color) %>% # dependent variable, "|" used to suppress label
    tab_cols(sex %nest% ses) %>% # column variable
    tab_rows(agegroup) %>% 
    tab_total_row_position("none") %>% # we don't need total
    tab_stat_cases(label = "Cases") %>% # calculate cases
    tab_stat_cpct(label = "%") %>% # calculate percent
    tab_pivot(stat_position = "inside_columns") %>% # finalize table
    make_subheadings(number_of_columns = 2)

# difficult part - add percent sign
for(i in grep("%", colnames(res))){
    res[[i]] = ifelse(trimws(res[[i]])!="", 
                      paste0(round(res[[i]], 1), "%"),
                      res[[i]] 
                      )
}

# additionlly remove stat labels
colnames(res) = gsub("\\|Cases|%", "", colnames(res), perl = TRUE)

res

在RStudio Viewer中,结果将为HTML格式(参见图片).不幸的是,我无法测试如何将其粘贴到MS Word. 免责声明:我是expss软件包的作者.

In the RStudio Viewer result will be in the HTML format (see image). Unfortunately, I can't test how it will be pasted to the MS Word. Disclaimer: I am an author of expss package.

这篇关于如何创建具有1个自变量和3个因变量的计数和百分比表及折线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆