来自嵌套单词列表的共现矩阵 [英] Co-occurrence matrix from nested list of words
本文介绍了来自嵌套单词列表的共现矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个名字列表,例如:
I have a list of names like:
names = ['A', 'B', 'C', 'D']
和一个文件列表,在每个文件中都提到了其中的一些名称.
and a list of documents, that in each documents some of these names are mentioned.
document =[['A', 'B'], ['C', 'B', 'K'],['A', 'B', 'C', 'D', 'Z']]
我想得到一个作为共现矩阵的输出,例如:
I would like to get an output as a matrix of co-occurrences like:
A B C D
A 0 2 1 1
B 2 0 2 1
C 1 2 0 1
D 1 1 1 0
在 R 中有一个解决方案(Creating co-occurrence matrix),但我不能在 Python 中做到这一点.我想在 Pandas 上做,但没有进展!
There is a solution (Creating co-occurrence matrix) for this problem in R, but I couldn't do it in Python. I am thinking of doing it in Pandas, but yet no progress!
推荐答案
很明显,这可以根据您的目的进行扩展,但它执行的是一般操作:
Obviously this can be extended for your purposes, but it performs the general operation in mind:
import math
for a in 'ABCD':
for b in 'ABCD':
count = 0
for x in document:
if a != b:
if a in x and b in x:
count += 1
else:
n = x.count(a)
if n >= 2:
count += math.factorial(n)/math.factorial(n - 2)/2
print '{} x {} = {}'.format(a, b, count)
这篇关于来自嵌套单词列表的共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文