是否需要创建带有2个分类因子变量的交叉表? [英] Need to create a crosstab with 2 categorical factor variables?
问题描述
我有2个类别变量,分别是收入水平和临时签证状态,以及每种组合的计数。
I have 2 categorical variables income level and Temporary visa status and the count for each combination.
我需要做的是创建一个比例交叉表条形图的交叉表,以获取收入水平内不同临时签证类别的比例
All i need is a crosstab for creating a proportion crosstab bar chart to get the proportion of different temporary visa categories within an income level
library(readxl)
Crosstab_Temporary_visas_income <- read_excel("C:/Users/axelp/Documents/RMIT/Semester 2/Data Visualisation/Assignment 3/Crosstab Temporary visas income.xls")
str(Crosstab_Temporary_visas_income)
margin.table(Crosstab_Temporary_visas_income,1) #Row marginals
Error in margin.table(Crosstab_Temporary_visas_income, 1) :
'x' is not an array
> str(Crosstab_Temporary_visas_income)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 9 obs. of 6 variables:
$ Income : chr "Negative / nil income" "$1– 299" "$30 - 649" "$650– 999" ...
$ Temporary Work (Skilled): num 405 2364 6496 19248 41595 ...
$ Student : num 2169 33846 104569 27140 6737 ...
$ New Zealand Citizen : num 2446 16045 51337 104133 98986 ...
$ Working Holiday Maker : num 515 3670 18119 24476 7869 ...
$ Other Temporary visa : num 887 5325 24234 31975 16269 ...
structure(list(...1 = c("0", "$1– 299", "$30 - 649", "$650– 999"
), `Temporary Work (Skilled)` = c(405, 2364, 6496, 19248), Student = c(2169,
33846, 104569, 27140), `New Zealand Citizen` = c(2446, 16045,
51337, 104133), `Working Holiday Maker` = c(515, 3670, 18119,
24476), `Other Temporary visa` = c(887, 5325, 24234, 31975)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
我在导入的csv创建交叉表,但我所得到的只是6000多个矩阵切片
I used the table function on the imported csv to create a crosstab but all i get is more than 6000 matrix slices
推荐答案
阅读您的数据输入是因为行名被读取为标为 ... 1的列。如果列名少于列数,通常R将识别行名。
There was a problem when you read your data in because the row names were read as a column labeled "...1". Usually R will recognize a row names if there is one fewer column name than the number of columns. Nothing will work until you fix that.
library(tidyverse)
CTVI <- structure(list(...1 = c("0", "$1– 299", "$30 - 649", "$650– 999"),
`Temporary Work (Skilled)` = c(405, 2364, 6496, 19248), Student = c(2169,
33846, 104569, 27140), `New Zealand Citizen` = c(2446, 16045,
51337, 104133), `Working Holiday Maker` = c(515, 3670, 18119,
24476), `Other Temporary visa` = c(887, 5325, 24234, 31975)),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
现在我们需要删除第一列,将其用作行名,并将小标题转换为矩阵,因为某些表功能(例如 addmargins
和 margin.table
不接受小标题:
Now we need to delete the first column, use it for the row names, and convert the tibble to a matrix since some table functions such as addmargins
and margin.table
do not accept tibbles:
CTVI.mat <- as.matrix(CTVI[, -1])
rownames(CTVI.mat) <- unlist(CTVI[, 1])
CTVI.mat <- CTVI.mat[, -1]
names(dimnames(CTVI.mat)) <- c("Income", "Visa")
现在我们可以计算边距或比例了:
Now we can compute margins or proportions:
margin.table(CTVI.mat, 1)
addmargins(as.matrix(CTVI.mat))
round(prop.table(as.matrix(CTVI.mat), 1), 3)
这篇关于是否需要创建带有2个分类因子变量的交叉表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!