将数据帧中的分类数据转换为加权邻接矩阵 [英] Convert categorical data in data frame to weighted adjacency matrix

查看:284
本文介绍了将数据帧中的分类数据转换为加权邻接矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据帧,称为DF,它是一个由三个向量组成的数据帧:ChunkName和Frequency。我需要把它变成一个NameXName邻接矩阵,当它们位于相同的块中时,它们被认为是相邻的。所以例如,在第一行,Gretel和Friedrich是相邻的,因为它们都在Chunk2中。而关系的权重应该基于频率,恰恰是它们在同一个块中共存的次数,所以对于Gretel / Friedrich的例子,Frequency(Gretel)+ Frequency(Friedrich)-1 = 5

 块名称频率
1 2 Gretel 2
2 2 Pollock 1
3 2 Adorno 1
4 2 Friedrich 4
5 3最高1
6 3 Horkheimer 1
7 3 Adorno 1
8 4 Friedrich 5
9 4 Pollock 1
10 3月1日4月1日
11 5 Comte 3
12 7 Jaspers 1
13 7 Huxley 2
14 8 Nietzsche 1
15 8 Sade 2
16 8 Felix 1
17 8 Weil 1
18 8 Western 1
1 9 8 Lowenthal 1
20 8康德1
21 8希特勒1

我根据DF $ Chunk分割数据框,开始破解,

 > DF.split< -split(DF,DF $ Chunk)

$`2`
块名称频率
1 2 Gretel 2
2 2 Pollock 1
3 2 Adorno 1
4 2 Friedrich 4

$`3`
块名称频率
5 3最大1
6 3 Horkheimer 1
7 3 Adorno 1

$`4`
块名称频率
8 4 Friedrich 5
9 4 Pollock 1
10 4 3月1日

我认为更接近,但它返回列表项,我有麻烦转回可行的数据框架。



我也尝试开始将其转换成ChunkXName邻接矩阵:

 > chunkbyname< -tapply(DF $ Frequency,list(DF $ Name,DF $ Chunk),as.character)



希望通过其转置将chunkbyname乘以NAmeXName矩阵,但是似乎这是矩阵太稀疏或复杂(%*%b中的错误:需要numeric / complex matrix / vector arguments)。



非常感谢任何有助于将此数据帧转换为邻接矩阵的帮助。

解决方案

这是你要找的吗?

  df3<  -  by(df,df $ Chunk,function(x){
mm < - outer(x $ Frequency,x $ Frequency,+) - 1
rownames(mm)< - x $ Name
colnames(mm)< - x $ Name
mm
})

df3

#$`2`
#Gretel Pollock Adorno Friedrich
#Gretel 3 2 2 5
#Pollock 2 1 1 4
#阿多诺2 1 1 4
#Friedrich 5 4 4 7

#$`3`
#Max Horkheimer Adorno
#最大1 1 1
#Horkheimer 1 1 1
#Adorno 1 1 1

#$`4`
#Friedrich Pollock March
#Friedrich 9 5 5
#Pollock 5 1 1
#3月5 1 1


I have the following data frame, call it DF, which is a data frame consisting in three vectors: "Chunk" "Name," and "Frequency." I need to turn it into a NameXName adjacency matrix where Names are considered adjacent when they reside in the same chunk. So for example, in the first lines, Gretel and Friedrich are adjacent because they are both in Chunk2. And the weight of the relationship should be based on "Frequency," precisely the number of times they are co-present in the same chunk, so for the Gretel/Friedrich example, Frequency(Gretel)+Frequency(Friedrich)-1 = 5

    Chunk         Name Frequency  
1       2       Gretel         2  
2       2      Pollock         1 
3       2       Adorno         1   
4       2    Friedrich         4  
5       3          Max         1 
6       3   Horkheimer         1  
7       3       Adorno         1   
8       4    Friedrich         5  
9       4      Pollock         1 
10      4        March         1 
11      5        Comte         3  
12      7      Jaspers         1  
13      7       Huxley         2  
14      8    Nietzsche         1 
15      8         Sade         2 
16      8        Felix         1  
17      8         Weil         1 
18      8      Western         1 
19      8    Lowenthal         1 
20      8         Kant         1 
21      8       Hitler         1 

I started to crack at this by splitting the data frame according to DF$Chunk,

> DF.split<-split(DF, DF$Chunk) 

$`2`
  Chunk      Name Frequency
1     2    Gretel         2
2     2   Pollock         1
3     2    Adorno         1
4     2 Friedrich         4

$`3`
  Chunk       Name Frequency
5     3        Max         1
6     3 Horkheimer         1
7     3     Adorno         1

$`4`
   Chunk      Name Frequency
8      4 Friedrich         5
9      4   Pollock         1
10     4     March         1

which I thought got closer, but it returns list items that I am having trouble turning back into workable data frames.

I have also tried to start by turning this into a ChunkXName adjacency matrix:

> chunkbyname<-tapply(DF$Frequency , list(DF$Name,DF$Chunk) , as.character )

with the hopes of multiplying chunkbyname by its transpose to get the NAmeXName matrix, but it seems this is the matrix is too sparse or complex (Error in a %*% b : requires numeric/complex matrix/vector arguments).

Any help getting this data frame into an adjacency matrix greatly appreciated.

解决方案

Is this what you are looking for?

df3 <- by(df, df$Chunk, function(x){
  mm <- outer(x$Frequency, x$Frequency, "+") - 1
  rownames(mm) <- x$Name
  colnames(mm) <- x$Name
  mm
})

df3

# $`2`
#           Gretel Pollock Adorno Friedrich
# Gretel         3       2      2         5
# Pollock        2       1      1         4
# Adorno         2       1      1         4
# Friedrich      5       4      4         7
# 
# $`3`
#            Max Horkheimer Adorno
# Max          1          1      1
# Horkheimer   1          1      1
# Adorno       1          1      1
# 
# $`4`
#           Friedrich Pollock March
# Friedrich         9       5     5
# Pollock           5       1     1
# March             5       1     1

这篇关于将数据帧中的分类数据转换为加权邻接矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆