是否需要创建带有2个分类因子变量的交叉表? [英] Need to create a crosstab with 2 categorical factor variables?

查看:93
本文介绍了是否需要创建带有2个分类因子变量的交叉表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个类别变量,分别是收入水平和临时签证状态,以及每种组合的计数。

I have 2 categorical variables income level and Temporary visa status and the count for each combination.

我需要做的是创建一个比例交叉表条形图的交叉表,以获取收入水平内不同临时签证类别的比例

All i need is a crosstab for creating a proportion crosstab bar chart to get the proportion of different temporary visa categories within an income level

library(readxl)
Crosstab_Temporary_visas_income <- read_excel("C:/Users/axelp/Documents/RMIT/Semester 2/Data Visualisation/Assignment 3/Crosstab Temporary visas income.xls")

str(Crosstab_Temporary_visas_income)

margin.table(Crosstab_Temporary_visas_income,1) #Row marginals

Error in margin.table(Crosstab_Temporary_visas_income, 1) : 
  'x' is not an array

> str(Crosstab_Temporary_visas_income)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   9 obs. of  6 variables:
 $ Income                  : chr  "Negative / nil income" "$1– 299" "$30 - 649" "$650– 999" ...
 $ Temporary Work (Skilled): num  405 2364 6496 19248 41595 ...
 $ Student                 : num  2169 33846 104569 27140 6737 ...
 $ New Zealand Citizen     : num  2446 16045 51337 104133 98986 ...
 $ Working Holiday Maker   : num  515 3670 18119 24476 7869 ...
 $ Other Temporary visa    : num  887 5325 24234 31975 16269 ...



structure(list(...1 = c("0", "$1– 299", "$30 - 649", "$650– 999"
), `Temporary Work (Skilled)` = c(405, 2364, 6496, 19248), Student = c(2169, 
33846, 104569, 27140), `New Zealand Citizen` = c(2446, 16045, 
51337, 104133), `Working Holiday Maker` = c(515, 3670, 18119, 
24476), `Other Temporary visa` = c(887, 5325, 24234, 31975)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

我在导入的csv创建交叉表,但我所得到的只是6000多个矩阵切片

I used the table function on the imported csv to create a crosstab but all i get is more than 6000 matrix slices

推荐答案

阅读您的数据输入是因为行名被读取为标为 ... 1的列。如果列名少于列数,通常R将识别行名。

There was a problem when you read your data in because the row names were read as a column labeled "...1". Usually R will recognize a row names if there is one fewer column name than the number of columns. Nothing will work until you fix that.

library(tidyverse)
CTVI <- structure(list(...1 = c("0", "$1– 299", "$30 - 649", "$650– 999"), 
`Temporary Work (Skilled)` = c(405, 2364, 6496, 19248), Student = c(2169, 
33846, 104569, 27140), `New Zealand Citizen` = c(2446, 16045, 
51337, 104133), `Working Holiday Maker` = c(515, 3670, 18119, 
24476), `Other Temporary visa` = c(887, 5325, 24234, 31975)),  
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

现在我们需要删除第一列,将其用作行名,并将小标题转换为矩阵,因为某些表功能(例如 addmargins margin.table 不接受小标题:

Now we need to delete the first column, use it for the row names, and convert the tibble to a matrix since some table functions such as addmargins and margin.table do not accept tibbles:

CTVI.mat <- as.matrix(CTVI[, -1])
rownames(CTVI.mat) <- unlist(CTVI[, 1])
CTVI.mat <- CTVI.mat[, -1]
names(dimnames(CTVI.mat)) <- c("Income", "Visa")

现在我们可以计算边距或比例了:

Now we can compute margins or proportions:

margin.table(CTVI.mat, 1) 
addmargins(as.matrix(CTVI.mat))
round(prop.table(as.matrix(CTVI.mat), 1), 3)

这篇关于是否需要创建带有2个分类因子变量的交叉表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆