汇总因子水平计数-按因子 [英] Aggregating factor level counts - by factor

查看:97
本文介绍了汇总因子水平计数-按因子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图制作一张表格,显示另一个因子的因子水平计数.为此,我查看了数十页的问题,试图使用某些软件包(dplyr,reshape)中的函数来完成工作,但未能正确使用它们.

I have been trying to make a table displaying the counts of factor levels by another factor. For this, I looked on dozens of pages, questions... trying to use functions in some packages (dplyr, reshape) to get the job done, without any success in using them correctly.

这就是我得到的:

# my data:
var1 <- c("red","blue","red","blue","red","red","red","red","red","red","red","red","blue","red","blue")
var2 <- c("0","1","0","0","0","0","0","0","0","0","1","0","0","0","0")
var3 <- c("2","2","1","1","1","3","1","2","1","1","3","1","1","2","1")
var4 <- c("0","1","0","0","0","0","1","0","1","1","0","1","0","1","1")
mydata <- data.frame(var1,var2,var3,var4)
head(mydata)

尝试n + 1:仅显示另一个因子的因子总数.

Attempt n+1: displays only the total counts of factors by another factor.

t(aggregate(. ~ var1, mydata, sum))

      [,1]   [,2] 
var1 "blue" "red"
var2 " 5"   "12" 
var3 " 5"   "18" 
var4 " 6"   "16" 

尝试n + 2:这是正确的格式,但我无法在多个因素上使用它.

Attempt n+2: it's the correct format but I couldn't get it to work on more than one factor.

library(dplyr)
data1 <- ddply(mydata, c("var1", "var3"), summarise,
            N    = length(var1))
library(reshape)
df1 <- cast(data1, var1 ~ var3, sum)
df1 <- t(df1)
df1

   blue red
1    3   6
2    1   3
3    0   2

我想要的是:

        blue red
var2.0    3  10
var2.1    1   1
var3.1    3   6
var3.2    1   3
var3.3    0   2
var4.0    2   6
var4.1    2   5

如何获得这种格式?提前非常感谢,

How can I get this format? So many thanks in advance,

推荐答案

我们可以通过'var1'melt数据集,然后使用table

We can melt the dataset by 'var1' and then use table

library(reshape2)
tbl <- table(transform(melt(mydata, id.var="var1"),
        varN = paste(variable, value, sep="."))[c(4,1)])
names(dimnames(tbl)) <- NULL
tbl 
#
#         blue red
#  var2.0    3  10
#  var2.1    1   1
#  var3.1    3   6
#  var3.2    1   3
#  var3.3    0   2
#  var4.0    2   6
#  var4.1    2   5


或者使用dplyr/tidyr,我们使用gather将数据集从宽"格式转换为长"格式,然后使用unite列("var","val")创建"varV",得到先按"var1"和"varV"分组,然后按spread转换为宽"格式.


Or using dplyr/tidyr, we convert the dataset from 'wide' to 'long' format with gather, then unite the columns ('var', 'val') to create 'varV', get the frequency (tally) after grouping by 'var1' and 'varV', and then spread to 'wide' format.

library(dplyr)
library(tidyr)
gather(mydata, var, val, -var1) %>% 
           unite(varV,var, val, sep=".") %>%
           group_by(var1, varV) %>% 
           tally() %>% 
           spread(var1, n, fill = 0)
#    varV  blue   red
#   <chr> <dbl> <dbl>
#1 var2.0     3    10
#2 var2.1     1     1
#3 var3.1     3     6
#4 var3.2     1     3
#5 var3.3     0     2
#6 var4.0     2     6
#7 var4.1     2     5

这篇关于汇总因子水平计数-按因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆