R中按分组变量分类的表 [英] Table of categorical variables by a grouping variable in R

查看:193
本文介绍了R中按分组变量分类的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些分类变量和一个集群"变量的数据集.例如:

I have a dataset with some categorical variables + a "cluster" variable. For example:

time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon")
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10")
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes")
cluster <- c(1,1,2,3,2,2,3)

data <- cbind(time, dollar, with_kids, cluster)

如何通过群集"创建所有分类变量的频率表?

How can I create a frequency table of all the categorical variables by "cluster"?

所需的输出是右边的表(每个群集中每个分类变量的列%).

Desired output is the table on the right (column % of each categorical variable within each cluster).

我知道这段代码适用于一个变量.如果我还有更多分类变量,最有效的方法是什么?

I know this code will work for one variable. What is the most efficient way to do it if I have many more categorical variables?

table(data$time, data$cluster)

推荐答案

我不太确定您想要的输出,但是有两种可能.

I'm not entirely sure of your desired output, but here are two possibilities.

表列表:

myList <- lapply(dat[head(names(dat), -1)], table, dat$cluster)
myList
$time

            1 2 3
  Afternoon 0 1 1
  Evening   1 1 0
  Morning   1 1 1

$dollar

        1 2 3
  1-5   1 1 1
  11-15 0 1 0
  6-10  1 1 1

$with_kids

      1 2 3
  no  1 1 1
  yes 1 2 1

要获取比例表的列表,可以使用prop.table作为函数将表列表lapply并输入margin=2:

To get a list of proportion tables, you can lapply your list of tables using prop.table as the function and feed it margin=2:

lapply(myList, prop.table, margin=2)
$time

                    1         2         3
  Afternoon 0.0000000 0.3333333 0.5000000
  Evening   0.5000000 0.3333333 0.0000000
  Morning   0.5000000 0.3333333 0.5000000

$dollar

                1         2         3
  1-5   0.5000000 0.3333333 0.5000000
  11-15 0.0000000 0.3333333 0.0000000
  6-10  0.5000000 0.3333333 0.5000000

$with_kids

              1         2         3
  no  0.5000000 0.3333333 0.5000000
  yes 0.5000000 0.6666667 0.5000000

将它们绑在一起

do.call(rbind, lapply(dat[head(names(dat), -1)], table, dat$cluster))
          1 2 3
Afternoon 0 1 1
Evening   1 1 0
Morning   1 1 1
1-5       1 1 1
11-15     0 1 0
6-10      1 1 1
no        1 1 1
yes       1 2 1

数据

dat <- 
structure(list(time = structure(c(3L, 2L, 3L, 3L, 1L, 2L, 1L), .Label = c("Afternoon", 
"Evening", "Morning"), class = "factor"), dollar = structure(c(1L, 
3L, 2L, 1L, 1L, 3L, 3L), .Label = c("1-5", "11-15", "6-10"), class = "factor"), 
    with_kids = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("no", 
    "yes"), class = "factor"), cluster = c(1, 1, 2, 3, 2, 2, 
    3)), .Names = c("time", "dollar", "with_kids", "cluster"), row.names = c(NA, 
-7L), class = "data.frame")

这篇关于R中按分组变量分类的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆