填充共现矩阵 [英] Populating a co-occurrence matrix

查看:106
本文介绍了填充共现矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种快速有效的方法来填充共现矩阵(可以这么说). 这是我正在使用的数据示例:

I am looking for a fast and efficient way to populate a co-occurrence matrix(so as to say). Here is a sample of the data I am working with:

col1 col2
a e    
a f    
a e    
b f    
c g    
a e    
d f    
a e    
a g    
b e    
c e

我想要一个以下形式的矩阵:

And I want a matrix of the following form:

... e...  f...  g    
a    
b    
c    
d

以及与频率有关的相应条目.

with the corresponding entry relating to the frequency.

例如,矩阵中的元素(3,1)对应于(c,e)的同时出现频率,其值应为1,而(1,1)的值应为3对应于数据集中(a,e)的3个条目.

For example, element (3,1) in the matrix would correspond to frequency of the co-occurrence of (c,e) and should have a value of 1 and that of (1,1) should have a value 3 corresponding to 3 entries of (a,e) in the dataset.

我目前正在使用两个for循环分别计算项目,并且计算矩阵需要花费非常长的时间(实际数据大约有100万行).

I am currently individually calculating the items using two for loops and it takes an extremely long time to compute the matrix (the actual data has about a million rows).

推荐答案

您可以使用 sparse 即可完全满足您的需求:

You can use sparse to do exactly what you need:

spA = sparse(data(:,1), data(:,2), 1);

其中data是您的数据,但为数字.因此,您首先必须将字母转换为双精度字符.

where data is your data, but as numbers. So you first have to convert alphabetic characters to doubles.

Sparse将从data(:,1)data(:,2)组合成行/列对,每次出现对时加1.但是请注意,如果您希望矩阵对称,则可能需要对spA及其转置求和,具体取决于数据.

Sparse assembles row/column pairs from data(:,1) and data(:,2) adding 1 for every occurance of a pair. Note however that if you expect the matrix to be symmetric, you might need to sum spA and its transpose, depending on your data.

这篇关于填充共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆