计算加权相似度 [英] Calculating a weighted similarity

查看:773
本文介绍了计算加权相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据行,每个数据行都有4个字段

I have 2 data rows and each of them have 4 fields

类似这样的东西:

        field1  field2  field3  field4
Row 1
Row 2

现在,我必须比较这两个记录并计算相似度.我通过推导余弦相似度来计算每个字段的相似度.

Now I have to compare these two records and calculate the similarity. I calculate the similarity for each field by deriving the cosine similarity.

所以我最终得到了类似的东西: (0表示一周相似,1表示强烈相似)

So I end up with similarities something like this: (0 signifying a week similarity and 1 signifying a strong similarity)

field1: 0.12
field2: 0.67
field3: 1.00
field3: 0.93

我现在可以通过取平均值来找到总体相似度,但是问题是:
我想在字段中添加权重

I can now find the total similarity by averaging the value but the problem is:
I want to add weights to the fields

因此,如果field2的权重高于field1,则field2的相似性将对平均相似性做出重大贡献.

so if field2 has a higher weight than field1, then the similarity of field2 will have a significant contribution to the average similarity.

您可以建议一个公式或算法来满足这样的要求吗?

Can you suggest a formula or algorithm to satisfy such a requirement?

推荐答案

简单

  1. 将4个值中的每一个乘以它们的权重
  2. 将结果加在一起
  3. 除以权重之和

示例

  • 在该示例中,每个字段的权重均等于1

    Examples

    • In the example each of the fields can be thought to have an equal weight of 1

      ((0.12 * 1) + (0.67 * 1) + (1.00 * 1) + (0.93 * 1)) / 4 = 0.68
      

    • 现在,如果我们要使field2的价值比其他字段高2倍

    • Now if we want to make field2 worth 2x more than the other fields

      // Weights are (1 + 2 + 1 + 1) = 5
      ((0.12 * 1) + (0.67 * 2) + (1.00 * 1) + (0.93 * 1)) / 5 = 0.678
      

    • 如果我们希望字段3的权重为100倍(字段2仍为2倍)

    • If we want field 3 to have 100 times the weight (field 2 is still 2x)

      // Weights are (1 + 2 + 100 + 1) = 104
      ((0.12 * 1) + (0.67 * 2) + (1.00 * 100) + (0.93 * 1)) / 104 = 0.9845192307692308
      

    • ((field1 * field1_weight) + (field2 * field2_weight) + ... + (fieldn * fieldn_weight)) / (field1_weight + field2_weight + ... + fieldn_weight) = weighted_average
      

      分数权重

      如果将分数作为权重,则公式的工作原理相同.例如,如果您希望第四个字段的权重更多地被赋予150%权重,则可以为其他字段分配权重1.5

      Fractional weights

      The formula works just the same if you give fractions as weights. For example if you would like the weight of the 4th field to be weighted 150% more then the other fields you can assign it weight 1.5

      // Weights are (1 + 1 + 1 + 1.5) = 4.5
      ((0.12 * 1) + (0.67 * 1) + (1.00 * 1) + (0.93 * 1.5)) / 4.5 = 0.7077777777777778
      

      重量是相对的

      您不必将每个权重都设置为1,可以根据需要使用100或1000.

      Weights are relative

      You don't need to start with each of the weights set to 1, you can use 100 or 1000 if you like.

      例如,如果所有4个字段的权重均为100,则最终平均值将全部为1.

      For example if the weights for all 4 fields were 100 the final average would be the same if they were all 1.

      维基百科:加权算术平均值

      这篇关于计算加权相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆