即使单词有区别,如何在两个问题之间找到相似点 [英] how to find similarity between two question even though the words are differentiate

查看:95
本文介绍了即使单词有区别,如何在两个问题之间找到相似点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以找到字符串的含义是否相似,即使字符串中的单词有所区别

is there is any way to find the meaning of the string is similar or not,,, even though the words in the string are differentiated

直到现在我尝试模糊-模糊,列文斯坦距离,余弦相似度匹配字符串,但是全部匹配单词而不是单词的含义

Till now i tried fuzzy-wuzzy,levenstein distance,cosine similarity to match the string but all are matches the words not the meaning of the words

Str1 = "what are types of negotiation"
Str2 = "what are advantages of negotiation"
Str3 = "what are categories of negotiation"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Ratio1 = fuzz.ratio(Str1.lower(),Str3.lower())
Partial_Ratio1 = fuzz.partial_ratio(Str1.lower(),Str3.lower())
Token_Sort_Ratio1 = fuzz.token_sort_ratio(Str1,Str3)
print("fuzzywuzzy")
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str2," ",Partial_Ratio)
print(Str1," ",Str2," ",Token_Sort_Ratio)
print(Str1," ",Str3," ",Ratio1)
print(Str1," ",Str3," ",Partial_Ratio1)
print(Str1," ",Str3," ",Token_Sort_Ratio1)
print("levenshtein ratio")
Ratio = levenshtein_ratio_and_distance(Str1,Str2,ratio_calc = True)
Ratio1 = levenshtein_ratio_and_distance(Str1,Str3,ratio_calc = True)
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str3," ",Ratio)

output:
fuzzywuzzy
what are types of negotiation   what are advantages of negotiation   86
what are types of negotiation   what are advantages of negotiation   76
what are types of negotiation   what are advantages of negotiation   73
what are types of negotiation   what are categories of negotiation   86
what are types of negotiation   what are categories of negotiation   76
what are types of negotiation   what are categories of negotiation   73
levenshtein ratio
what are types of negotiation   what are advantages of negotiation               
0.8571428571428571
what are types of negotiation   what are categories of negotiation       
0.8571428571428571



expected output:
"what are the types of negotiation skill?"
"what are the categories in negotiation skill?"
output:similar
"what are the types of negotiation skill?"
"what are the advantages of negotiation skill?"
output:not similar


推荐答案

得分两个字符串在语义上的相似性。

You want to score the semantic similarity of two strings.

Fuzzy-wuzzy和Levenshtein距离仅对字符距离进行得分。

Fuzzy-wuzzy and Levenshtein distance score only characters distance.

您需要考虑语义信息。因此,您需要为字符串提供语义表示。

You need to account semantic information. So, you need a semantic representation for your string.

也许一个简单而有效的方法包括:

Maybe a simple but effective method consists in:


  1. 使用针对您的语言的预训练词嵌入,计算代表两个字符串的两个向量(例如,FastText-get_sentence_vector https://fasttext.cc/docs/zh-CN/python-module.html#model-object

  2. 计算两个向量之间的余弦相似度(1:相等的字符串; 0:完全不同的字符串)。

当然,有更好,更复杂的方法。
要深入了解此主题,我建议您发布此帖子( https:// medium。 com / @ adriensieg / text-similarities-da019229c894 ),其中包含丰富的说明和代码实现。

Surely, there are better and more complex methods. To deeply understand this subject, I suggest this post (https://medium.com/@adriensieg/text-similarities-da019229c894), which is rich of explanations and code implementations.

这篇关于即使单词有区别,如何在两个问题之间找到相似点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆