计算相似度的方法 [英] Ways to calculate similarity

查看:125
本文介绍了计算相似度的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个社区网站,要求我计算任何两个用户之间的相似度。每个用户都具有以下属性:

I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes:

年龄,皮肤类型(油性,干性),头发类型(长,短,中),生活方式(活跃的户外恋人,

age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others.

有人可以告诉我如何解决这个问题或为我提供一些资源吗?

Can anyone tell me how to go about this problem or point me to some resources?

推荐答案

另一种计算方式(在 R 中)数据集中观测值之间的所有成对的差异(距离)。原始变量可以是混合类型。通过使用Gower的一般相异系数来处理标称,序数和(a)对称二进制数据(Gower,J.C。(1971)相似性的一般系数及其某些特性,Biometrics 27,857–874)。有关更多信息,请参见第47页上的。如果x包含这些数据类型的任何列,则将使用高尔系数作为度量。

Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved by using the general dissimilarity coefficient of Gower (Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857–874). For more check out this on page 47. If x contains any columns of these data-types, Gower's coefficient will be used as the metric.

例如

x1 <- factor(c(10, 12, 25, 14, 29))
x2 <- factor(c("oily", "dry", "dry", "dry", "oily"))
x3 <- factor(c("medium", "short", "medium", "medium", "long"))
x4 <- factor(c("active outdoor lover", "TV junky", "TV junky", "active outdoor lover", "TV junky"))
x <- cbind(x1,x2,x3,x4)

library(cluster)
daisy(x, metric = "euclidean")

您将获得:

Dissimilarities :
         1        2        3        4
2 2.000000                           
3 3.316625 2.236068                  
4 2.236068 1.732051 1.414214         
5 4.242641 3.741657 1.732051 2.645751

如果您对分类数据的降维方法(也是一种将变量排列为同质的方法)感兴趣我们的群集)检查

If you are interested on a method for dimensionality reduction for categorical data (also a way to arrange variables into homogeneous clusters) check this

这篇关于计算相似度的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆