不同长度向量的余弦相似度? [英] Cosine Similarity of Vectors of different lengths?
本文介绍了不同长度向量的余弦相似度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用TF-IDF 将文档分类为类别.我已经计算了一些文档的tf_idf,但是现在当我尝试计算其中两个文档之间的余弦相似度时,我会得到一个回溯信息:
I'm trying to use TF-IDF to sort documents into categories. I've calculated the tf_idf for some documents, but now when I try to calculate the Cosine Similarity between two of these documents I get a traceback saying:
#len(u)==201, len(v)==246
cosine_distance(u, v)
ValueError: objects are not aligned
#this works though:
cosine_distance(u[:200], v[:200])
>> 0.52230249969265641
对向量进行切片,以便len(u)== len(v)正确吗?我认为余弦相似度适用于不同长度的向量.
Is slicing the vector so that len(u)==len(v) the right approach? I would think that cosine similarity would work with vectors of different lengths.
查看全文