R:如何通过ID变量(作为新数据框)获得两个因子变量的通用水平计数(频率) [英] R: How to get common counts (frequency) of levels of two factor variables by ID Variable (as new data frame)
问题描述
为使问题更清楚,让我从数据框的一个婴儿示例开始。
To get the question clear, let me start with one baby example of my data frame.
ID <- c(rep("first", 2), rep("second", 4), rep("third",1), rep("fourth", 3))
Var_1 <- c(rep("A",2), rep("B", 2), rep("A",3), rep("B", 2), "A")
Var_2 <- c(rep("C",2), rep("D",3) , rep("C",2), rep("E",2), "D")
DF <- data.frame(ID, Var_1, Var_2)
> DF
ID Var_1 Var_2
1 first A C
2 first A C
3 second B D
4 second B D
5 second A D
6 second A C
7 third A C
8 fourth B E
9 fourth B E
10 fourth A D
有一个 ID
因子变量和两个因子变量 Var_1
和 R = 2
因子水平和 Var_2
和 C = 3
因子水平。
There is one ID
factor variable and two factor variables Var_1
with R=2
factor levels and Var_2
with C=3
factor levels.
我想用(RxC)+ 1 =(2x3)+1
获得一个新的数据框,因子级别的所有组合的频率-ID变量中的每个级别分别如下:
I would like to get a new data frame with (RxC)+1=(2x3)+1
Variables with the frequencies of all combinations of factor levels - separately for each level in ID Variable, that would look like this:
ID A.C A.D A.E B.C B.D B.E
1 first 2 0 0 0 0 0
2 second 1 1 0 0 2 0
3 third 1 0 0 0 0 0
4 fourth 0 1 0 0 0 2
我尝试了几个函数,但结果甚至不尽相同,因此甚至不值得一提。在原始数据帧中,我应该得到(6x9)+ 1 = 55个变量。
I tried a couple of functions, but results were not even close to this, so they are not even worth of mentioning. In original data frame I should get (6x9)+1=55 Variables.
编辑:有一些解决方案可以分别计算一个或多个变量的因子水平,但是我不能不能弄清楚如何对两个(或多个)变量的因子水平组合进行通用计数。现在,当我得到答案时,对其他人实现解决方案似乎很容易,但我一个人无法到达。
There are solutions for counting factor levels for one or many variables separatly, but I couldn´t figure it out how to make a common counts for combinations of factor levels for two (or more) variables. Implementig the solution to others seems easy now when I got the answers, but I could not get there by myself.
推荐答案
reshape 包(或 data.table )中的 dcast
函数,该函数对 dcast
函数):
Using the dcast
function from the reshape package (or data.table which has an enhanced implementation of the dcast
function):
library(reshape2)
dcast(DF, ID ~ paste(Var_1,Var_2,sep="."), fun.aggregate = length)
:
ID A.C A.D B.D B.E
1 first 2 0 0 0
2 fourth 0 1 0 2
3 second 1 1 2 0
4 third 1 0 0 0
这篇关于R:如何通过ID变量(作为新数据框)获得两个因子变量的通用水平计数(频率)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!