如何通过含义比较两个字符串? [英] how to compare two strings by meaning?

查看:65
本文介绍了如何通过含义比较两个字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的node.js应用程序的用户写下想法,然后将其存储在数据库中.到目前为止,一切都很好,但是我不希望该表中有多余的条目,因此,我决定使用此表来检查相似性: https://www.npmjs.com/package/string-similarity-js

I want the user of my node.js application to write down ideas, which then get stored in a database. So far so good, but I don't want redundant entrys in that table, so I decided to check for similarity, using this one: https://www.npmjs.com/package/string-similarity-js

您知道一种可以比较两个字符串含义的方法吗?就像在使用公共交通工具"与乘火车驾驶"中获得较高的相似性得分一样,在上一个方面的表现非常差.

Do you know a way, in which I can compare two strings by meaning? In like getting a high similarity score for "using public transport" vs "driving by train" which performs very poor in the above one.

推荐答案

比较两个字符串的含义仍在进行中.如果您真的想解决问题(或者要获得很好的语言模态表现),则应该考虑获得博士学位.

Comparing the meaning of two string is still an ongoing research. If you really want to solve the problem (or to get really good performance of your language modal) you should consider get a PhD.

当时适用于开箱即用的解决方案:我发现了这个Github存储库,该存储库实现了Google的BERT模态,并使用它来嵌入两个句子.从理论上讲,如果嵌入相似,则两个句子具有相同的含义.

For out of box solution at the time: I found this Github repo that implement google's BERT modal and use it to get the embedding of two sentences. In theory, the two sentence share the same meaning if there embedding is similar.

https://github.com/UKPLab/sentence-transformers

# the following is simplified from their README.md
embedder = SentenceTransformer('bert-base-nli-mean-tokens')

# Corpus with example sentences
S1 = ['A man is eating a food.']
S2 = ['A man is eating pasta.']

s1_embedding = embedder.encode(S1)
s2_embedding = embedder.encode(S2)

dist = scipy.spatial.distance.cdist([s1_embedding], [s2_embedding], "cosine")[0]

Example output (copied from their README.md)

Query: A man is eating pasta.
Top 5 most similar sentences in corpus:
A man is eating a piece of bread. (Score: 0.8518)
A man is eating a food. (Score: 0.8020)
A monkey is playing drums. (Score: 0.4167)
A man is riding a horse. (Score: 0.2621)
A man is riding a white horse on an enclosed ground. (Score: 0.2379)

这篇关于如何通过含义比较两个字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆