2个数字列表之间的余弦相似度 [英] Cosine Similarity between 2 Number Lists

查看:54
本文介绍了2个数字列表之间的余弦相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算两个列表之间的余弦相似度,例如列表1是dataSetI,列表2是<代码>数据集II.

I want to calculate the cosine similarity between two lists, let's say for example list 1 which is dataSetI and list 2 which is dataSetII.

假设 dataSetI[3, 45, 7, 2]dataSetII[2, 54, 13,15].列表的长度总是相等.我想将余弦相似度报告为 0 到 1 之间的数字.

Let's say dataSetI is [3, 45, 7, 2] and dataSetII is [2, 54, 13, 15]. The length of the lists are always equal. I want to report cosine similarity as a number between 0 and 1.

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]

def cosine_similarity(list1, list2):
  # How to?
  pass

print(cosine_similarity(dataSetI, dataSetII))

推荐答案

你应该试试 SciPy.它有很多有用的科学程序,例如,数值计算积分、求解微分方程、优化和稀疏矩阵的程序".它使用超快速优化的 NumPy 进行数字运算.请参阅此处进行安装.

You should try SciPy. It has a bunch of useful scientific routines for example, "routines for computing integrals numerically, solving differential equations, optimization, and sparse matrices." It uses the superfast optimized NumPy for its number crunching. See here for installing.

请注意,spatial.distance.cosine 计算的是距离,而不是相似度.因此,您必须从 1 中减去该值才能获得相似性.

Note that spatial.distance.cosine computes the distance, and not the similarity. So, you must subtract the value from 1 to get the similarity.

from scipy import spatial

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)

这篇关于2个数字列表之间的余弦相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆