有效的方法来计算数据集之间的相似性比例 [英] Effective way to calculate a similarity percentage between data sets

查看:137
本文介绍了有效的方法来计算数据集之间的相似性比例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在与用户对象 - 每一个有许多目标对象。该目标对象不特定的用户,即用户可以共享同一个目标。我试图以时尚的方式来计算两个用户...之间的相似性百分比(即,考虑到有多少目标,他们分享,以及有多少目标,他们不同意)有没有人有这种经验情况?我使用的Grails与MySQL,如果这是有帮助的。

I am currently working with User objects -- each of which have many Goal objects. The Goal objects are not User specific, that is, Users can share the same Goal. I am attempting to fashion a way to calculate a "similarity percentage" between two Users... (i.e., taking into account how many Goals they share as well as how many Goals they do not share) Does anyone have experience with this type of situation? I am using Grails with Mysql if that is helpful.

感谢

推荐答案

标准的方式做,这是杰卡德相似。如果A是一组第一用户的目标和B是集合的第二用户的目标,杰卡德相似度是:

The standard way to do this is the Jaccard similarity. If A is the set of goals of the first user and B is the set of goals of the second user, the Jaccard similarity is:

#(A intersect B)/#(A union B)

这是目标,他们通过投票总数两人有共同居住划分数(计算的目标,它们的份额只有一次)。因此,如果第一个用户都有进球A = {1,2,3},第二用户的目标B = {2,4}则是这样的:

This is the number of goals they share divided by the total number of votes the two have together (counting goals that they share only once). So if the first user has goals A={1,2,3} and the second user has goals B={2,4} then it is this:

A intersect B = {2}
A union B = {1,2,3,4}

#(A intersect B)/#(A union B) = 1/4

该杰卡德相似性总是在0(他们共享没有目标)和1(它们具有相同的目标),这样你就可以用它乘以100得到的百分比。

The Jaccard similarity is always between 0 (they share no goals) and 1 (they have the same goals), so you can get a percentage by multiplying it by 100.

http://en.wikipedia.org/wiki/Jaccard_index

这篇关于有效的方法来计算数据集之间的相似性比例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆