填充共现矩阵 [英] Populating a co-occurrence matrix
问题描述
我正在寻找一种快速有效的方法来填充共现矩阵(可以这么说). 这是我正在使用的数据示例:
I am looking for a fast and efficient way to populate a co-occurrence matrix(so as to say). Here is a sample of the data I am working with:
col1 col2
a e
a f
a e
b f
c g
a e
d f
a e
a g
b e
c e
我想要一个以下形式的矩阵:
And I want a matrix of the following form:
... e... f... g
a
b
c
d
以及与频率有关的相应条目.
with the corresponding entry relating to the frequency.
例如,矩阵中的元素(3,1)对应于(c,e)的同时出现频率,其值应为1,而(1,1)的值应为3对应于数据集中(a,e)的3个条目.
For example, element (3,1) in the matrix would correspond to frequency of the co-occurrence of (c,e) and should have a value of 1 and that of (1,1) should have a value 3 corresponding to 3 entries of (a,e) in the dataset.
我目前正在使用两个for循环分别计算项目,并且计算矩阵需要花费非常长的时间(实际数据大约有100万行).
I am currently individually calculating the items using two for loops and it takes an extremely long time to compute the matrix (the actual data has about a million rows).
推荐答案
您可以使用 sparse
即可完全满足您的需求:
You can use sparse
to do exactly what you need:
spA = sparse(data(:,1), data(:,2), 1);
其中data
是您的数据,但为数字.因此,您首先必须将字母转换为双精度字符.
where data
is your data, but as numbers. So you first have to convert alphabetic characters to doubles.
Sparse将从data(:,1)
和data(:,2)
组合成行/列对,每次出现对时加1.但是请注意,如果您希望矩阵对称,则可能需要对spA
及其转置求和,具体取决于数据.
Sparse assembles row/column pairs from data(:,1)
and data(:,2)
adding 1 for every occurance of a pair. Note however that if you expect the matrix to be symmetric, you might need to sum spA
and its transpose, depending on your data.
这篇关于填充共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!