即使单词有区别,如何在两个问题之间找到相似点 [英] how to find similarity between two question even though the words are differentiate
问题描述
有什么方法可以找到字符串的含义是否相似,即使字符串中的单词有所区别
is there is any way to find the meaning of the string is similar or not,,, even though the words in the string are differentiated
直到现在我尝试模糊-模糊,列文斯坦距离,余弦相似度匹配字符串,但是全部匹配单词而不是单词的含义
Till now i tried fuzzy-wuzzy,levenstein distance,cosine similarity to match the string but all are matches the words not the meaning of the words
Str1 = "what are types of negotiation"
Str2 = "what are advantages of negotiation"
Str3 = "what are categories of negotiation"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Ratio1 = fuzz.ratio(Str1.lower(),Str3.lower())
Partial_Ratio1 = fuzz.partial_ratio(Str1.lower(),Str3.lower())
Token_Sort_Ratio1 = fuzz.token_sort_ratio(Str1,Str3)
print("fuzzywuzzy")
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str2," ",Partial_Ratio)
print(Str1," ",Str2," ",Token_Sort_Ratio)
print(Str1," ",Str3," ",Ratio1)
print(Str1," ",Str3," ",Partial_Ratio1)
print(Str1," ",Str3," ",Token_Sort_Ratio1)
print("levenshtein ratio")
Ratio = levenshtein_ratio_and_distance(Str1,Str2,ratio_calc = True)
Ratio1 = levenshtein_ratio_and_distance(Str1,Str3,ratio_calc = True)
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str3," ",Ratio)
output:
fuzzywuzzy
what are types of negotiation what are advantages of negotiation 86
what are types of negotiation what are advantages of negotiation 76
what are types of negotiation what are advantages of negotiation 73
what are types of negotiation what are categories of negotiation 86
what are types of negotiation what are categories of negotiation 76
what are types of negotiation what are categories of negotiation 73
levenshtein ratio
what are types of negotiation what are advantages of negotiation
0.8571428571428571
what are types of negotiation what are categories of negotiation
0.8571428571428571
expected output:
"what are the types of negotiation skill?"
"what are the categories in negotiation skill?"
output:similar
"what are the types of negotiation skill?"
"what are the advantages of negotiation skill?"
output:not similar
推荐答案
得分两个字符串在语义上的相似性。
You want to score the semantic similarity of two strings.
Fuzzy-wuzzy和Levenshtein距离仅对字符距离进行得分。
Fuzzy-wuzzy and Levenshtein distance score only characters distance.
您需要考虑语义信息。因此,您需要为字符串提供语义表示。
You need to account semantic information. So, you need a semantic representation for your string.
也许一个简单而有效的方法包括:
Maybe a simple but effective method consists in:
- 使用针对您的语言的预训练词嵌入,计算代表两个字符串的两个向量(例如,FastText-get_sentence_vector https://fasttext.cc/docs/zh-CN/python-module.html#model-object )
- 计算两个向量之间的余弦相似度(1:相等的字符串; 0:完全不同的字符串)。
当然,有更好,更复杂的方法。
要深入了解此主题,我建议您发布此帖子( https:// medium。 com / @ adriensieg / text-similarities-da019229c894 ),其中包含丰富的说明和代码实现。
Surely, there are better and more complex methods. To deeply understand this subject, I suggest this post (https://medium.com/@adriensieg/text-similarities-da019229c894), which is rich of explanations and code implementations.
这篇关于即使单词有区别,如何在两个问题之间找到相似点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!