R:如何通过ID变量(作为新数据框)获得两个因子变量的通用水平计数(频率) [英] R: How to get common counts (frequency) of levels of two factor variables by ID Variable (as new data frame)

查看:217
本文介绍了R:如何通过ID变量(作为新数据框)获得两个因子变量的通用水平计数(频率)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为使问题更清楚,让我从数据框的一个婴儿示例开始。

To get the question clear, let me start with one baby example of my data frame.

ID <- c(rep("first", 2), rep("second", 4), rep("third",1), rep("fourth", 3))
Var_1 <- c(rep("A",2), rep("B", 2), rep("A",3), rep("B", 2), "A")
Var_2 <- c(rep("C",2), rep("D",3) , rep("C",2),  rep("E",2), "D")

DF <- data.frame(ID, Var_1, Var_2)

> DF
       ID  Var_1 Var_2
1   first     A     C
2   first     A     C
3  second     B     D
4  second     B     D
5  second     A     D
6  second     A     C
7   third     A     C
8  fourth     B     E
9  fourth     B     E
10 fourth     A     D

有一个 ID 因子变量和两个因子变量 Var_1 R = 2 因子水平和 Var_2 C = 3 因子水平。

There is one ID factor variable and two factor variables Var_1 with R=2 factor levels and Var_2 with C=3 factor levels.

我想用(RxC)+ 1 =(2x3)+1 获得一个新的数据框,因子级别的所有组合的频率-ID变量中的每个级别分别如下:

I would like to get a new data frame with (RxC)+1=(2x3)+1 Variables with the frequencies of all combinations of factor levels - separately for each level in ID Variable, that would look like this:

      ID   A.C  A.D  A.E  B.C  B.D  B.E
1  first    2    0    0    0    0    0
2 second    1    1    0    0    2    0
3  third    1    0    0    0    0    0
4 fourth    0    1    0    0    0    2

我尝试了几个函数,但结果甚至不尽相同,因此甚至不值得一提。在原始数据帧中,我应该得到(6x9)+ 1 = 55个变量。

I tried a couple of functions, but results were not even close to this, so they are not even worth of mentioning. In original data frame I should get (6x9)+1=55 Variables.

编辑:有一些解决方案可以分别计算一个或多个变量的因子水平,但是我不能不能弄清楚如何对两个(或多个)变量的因子水平组合进行通用计数。现在,当我得到答案时,对其他人实现解决方案似乎很容易,但我一个人无法到达。

There are solutions for counting factor levels for one or many variables separatly, but I couldn´t figure it out how to make a common counts for combinations of factor levels for two (or more) variables. Implementig the solution to others seems easy now when I got the answers, but I could not get there by myself.

推荐答案

reshape 包(或 data.table )中的 dcast 函数,该函数对 dcast 函数):

Using the dcast function from the reshape package (or data.table which has an enhanced implementation of the dcast function):

library(reshape2)
dcast(DF, ID ~ paste(Var_1,Var_2,sep="."), fun.aggregate = length)

      ID A.C A.D B.D B.E
1  first   2   0   0   0
2 fourth   0   1   0   2
3 second   1   1   2   0
4  third   1   0   0   0

这篇关于R:如何通过ID变量(作为新数据框)获得两个因子变量的通用水平计数(频率)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆