如何计算两个向量的余弦相似度? [英] How do I calculate the cosine similarity of two vectors?

查看:1355
本文介绍了如何计算两个向量的余弦相似度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何找到矢量之间的余弦相似度?

How do I find the cosine similarity between vectors?

我需要找到相似度来衡量两行文本之间的相关性。

I need to find the similarity to measure the relatedness between two lines of text.

例如,我有两个句子,如:

For example, I have two sentences like:


用户界面系统

system for user interface

用户界面机

...及其在tF-idf之后的各自向量,然后使用LSI进行标准化,例如
[1,0.5] [0.5,1]

… and their respective vectors after tF-idf, followed by normalisation using LSI, for example [1,0.5] and [0.5,1].

我如何衡量这些向量之间的熟悉程度?

How do I measure the smiliarity between these vectors?

推荐答案

public class CosineSimilarity extends AbstractSimilarity {

  @Override
  protected double computeSimilarity(Matrix sourceDoc, Matrix targetDoc) {
    double dotProduct = sourceDoc.arrayTimes(targetDoc).norm1();
    double eucledianDist = sourceDoc.normF() * targetDoc.normF();
    return dotProduct / eucledianDist;
  }
}

我最近为我的信息做了一些tf-idf的东西大学检索单位。
我使用这个Cosine Similarity方法,该方法使用 Jama:Java Matrix Package

I did some tf-idf stuff recently for my Information Retrieval unit at University. I used this Cosine Similarity method which uses Jama: Java Matrix Package.

有关完整的源代码,请参阅 IR Math with Java:相似度量,非常好的资源,涵盖了很多不同的相似性度量。

For the full source code see IR Math with Java : Similarity Measures, really good resource that covers a good few different similarity measurements.

这篇关于如何计算两个向量的余弦相似度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆