使用哪些 NLP 工具来匹配具有相似含义或语义的短语 [英] What NLP tools to use to match phrases having similar meaning or semantics

查看:26
本文介绍了使用哪些 NLP 工具来匹配具有相似含义或语义的短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事一个项目,该项目要求我将一个词组或关键字与一组相似的关键字进行匹配.我需要对其进行语义分析.

I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same.

一个例子:

相关QT
便宜的健康保险
负担得起的健康保险
低成本医疗保险
更少的健康计划
廉价的医疗保险

Relevant QT
cheap health insurance
affordable health insurance
low cost medical insurance
health plan for less
inexpensive health coverage

常见含义

低成本健康保险

此处Common meaning 列下的单词应与Relevant QT 列下的单词相匹配.我查看了一堆工具和技术来做同样的事情.S-Match 看起来很有前途,但我必须使用 Python,而不是 Java.此外,潜在语义分析看起来不错,但我认为它更多地用于基于关键字而不是关键字匹配的文档分类.我对 NLTK 有点熟悉.有人可以就我应该前进的方向以及我应该使用哪些工具提供一些见解吗?

Here the the word under Common Meaning column should match the under Relevant QT column. I looked at a bunch of tools and techniques to do the same. S-Match seemed very promising, but I have to work in Python, not in Java. Also Latent Semantic Analysis looks good but I think its more for document classification based upon a Keyword rather than keyword matching. I am somewhat familiar with NLTK. Could someone provide some insight on what direction I should proceed and what tools I should use for the same?

推荐答案

当潜在语义分析提到一个文档"时,它基本上是指任何长度超过 1 的词集.你可以用它来计算之间的相似度一个文档和另一个文档,一个单词和另一个单词之间,或者一个单词和一个文档之间.因此,您当然可以将它用于您选择的应用程序.

When Latent Semantic Analysis refers to a "document", it basically means any set of words that is longer than 1. You can use it to compute the similarity between a document and another document, between a word and another word, or between a word and a document. So you could certainly use it for your chosen application.

其他可能有用的算法包括:

Other algorithms that may be useful include:

  • Random indexing ( https://www.sics.se/~mange/papers/RI_intro.pdf ) is easy enough to implement oneself without too much difficulty. There is also an implementation within https://code.google.com/p/airhead-research/ , but it's in Java, not Python.
  • Topic modeling ( http://psiexp.ss.uci.edu/research/papers/SteyversGriffithsLSABookFormatted.pdf ) - Python implementation at http://radimrehurek.com/gensim/tutorial.html
  • DISSECT ( http://clic.cimec.unitn.it/composes/toolkit/introduction.html ) - Python implementation at http://clic.cimec.unitn.it/composes/toolkit/installation.html
  • BEAGLE ( http://www.indiana.edu/~clcl/BEAGLE/Jones_Mewhort_PR.pdf ) - Python implementation at https://github.com/mike-lawrence/wikiBEAGLE

这篇关于使用哪些 NLP 工具来匹配具有相似含义或语义的短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆