如何计算两个字符串向量之间的余弦相似度 [英] How can I calculate Cosine similarity between two strings vectors

查看:255
本文介绍了如何计算两个字符串向量之间的余弦相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个维数为6的向量,并且我想要一个0到1之间的数字.

I have 2 vectors of dimensions 6 and I would like to have a number between 0 and 1.

a=c("HDa","2Pb","2","BxU","BuQ","Bve")

b=c("HCK","2Pb","2","09","F","G")

任何人都可以解释我该怎么做吗?

Can anyone explain what I should do?

推荐答案

使用lsa程序包和该程序包的手册

using the lsa package and the manual for this package

# create some files
library('lsa')
td = tempfile()
dir.create(td)
write( c("HDa","2Pb","2","BxU","BuQ","Bve"), file=paste(td, "D1", sep="/"))
write( c("HCK","2Pb","2","09","F","G"), file=paste(td, "D2", sep="/"))

# read files into a document-term matrix
myMatrix = textmatrix(td, minWordLength=1)

显示mymatrix对象的状态

myMatrix
#myMatrix
#       docs
#  terms D1 D2
#    2    1  1
#    2pb  1  1
#    buq  1  0
#    bve  1  0
#    bxu  1  0
#    hda  1  0
#    09   0  1
#    f    0  1
#    g    0  1
#    hck  0  1

# Calculate cosine similarity
res <- lsa::cosine(myMatrix[,1], myMatrix[,2])
res
#0.3333

这篇关于如何计算两个字符串向量之间的余弦相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆