使用哪些NLP工具来匹配具有相似含义或语义的短语 [英] What NLP tools to use to match phrases having similar meaning or semantics

查看:83
本文介绍了使用哪些NLP工具来匹配具有相似含义或语义的短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事一个项目,该项目要求我将短语或关键字与一组相似的关键字进行匹配.我需要对它执行语义分析.

I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same.

一个例子:

相关QT
廉价健康保险
负担得起的健康保险
低成本医疗保险
更少的健康计划
廉价的医疗保险

Relevant QT
cheap health insurance
affordable health insurance
low cost medical insurance
health plan for less
inexpensive health coverage

通用含义

低成本健康保险

此处常用含义"列下的单词应与相关QT"列下的单词匹配.我研究了很多工具和技术来实现相同的目的. S-Match看起来非常有前途,但是我必须使用Python,而不是Java.潜在语义分析也不错,但我认为它更多地用于基于关键字而不是关键字匹配的文档分类.我对NLTK有点熟悉.有人可以就我应该朝哪个方向以及应该使用哪些工具提供一些见识吗?

Here the the word under Common Meaning column should match the under Relevant QT column. I looked at a bunch of tools and techniques to do the same. S-Match seemed very promising, but I have to work in Python, not in Java. Also Latent Semantic Analysis looks good but I think its more for document classification based upon a Keyword rather than keyword matching. I am somewhat familiar with NLTK. Could someone provide some insight on what direction I should proceed and what tools I should use for the same?

推荐答案

当潜在语义分析指的是文档"时,它基本上意味着长度大于1的任何单词集.您可以使用它来计算之间的相似度一个文档和另一个文档,一个单词和另一个单词之间,或一个单词和文档之间.因此,您当然可以将其用于所选的应用程序.

When Latent Semantic Analysis refers to a "document", it basically means any set of words that is longer than 1. You can use it to compute the similarity between a document and another document, between a word and another word, or between a word and a document. So you could certainly use it for your chosen application.

其他可能有用的算法包括:

Other algorithms that may be useful include:

  • Random indexing ( https://www.sics.se/~mange/papers/RI_intro.pdf ) is easy enough to implement oneself without too much difficulty. There is also an implementation within https://code.google.com/p/airhead-research/ , but it's in Java, not Python.
  • Topic modeling ( http://psiexp.ss.uci.edu/research/papers/SteyversGriffithsLSABookFormatted.pdf ) - Python implementation at http://radimrehurek.com/gensim/tutorial.html
  • DISSECT ( http://clic.cimec.unitn.it/composes/toolkit/introduction.html ) - Python implementation at http://clic.cimec.unitn.it/composes/toolkit/installation.html
  • BEAGLE ( http://www.indiana.edu/~clcl/BEAGLE/Jones_Mewhort_PR.pdf ) - Python implementation at https://github.com/mike-lawrence/wikiBEAGLE

这篇关于使用哪些NLP工具来匹配具有相似含义或语义的短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆