两个短语之间的测量语义相似 [英] Measuring semantic similarity between two phrases

查看:211
本文介绍了两个短语之间的测量语义相似的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要测量两个短语/句子之间的语义相似性。有没有办法,我可以直接和可靠地使用任何框架?

I want to measure semantic similarity between two phrases/sentences. Is there any framework that I can use directly and reliably?

我已经签出<一href="http://stackoverflow.com/questions/62328/is-there-an-algorithm-that-tells-the-semantic-similarity-of-two-phrases">this问题,但它的pretty的老,我无法找到真正有用的答案出现。有一个环节,但我发现这个不可靠的。

I have already checked out this question, but its pretty old and I couldn't find real helpful answer there. There was one link, but I found this unreliable.

例如:
我一语:感觉粉碎
我有几个选择:向内的力,粉碎,破坏情绪,重塑等
我想找到这个词/短语与相似度最高的第一个。
这里的答案是:情绪破坏

e.g.:
I have a phrase: felt crushed
I have several choices: force inwards,pulverized, destroyed emotionally, reshaping etc.
I want to find the term/phrase with highest similarity to the first one.
The answer here is: destroyed emotionally.

大局观:我想确定这架从框架网络匹配给定的动词根据其在句子中使用

The bigger picture is: I want to identify which frame from FrameNet matches to the given verb as per its usage in a sentence.

更新:我发现这个库两个词之间的相似性测试非常有用。另外,ConceptNet相似的机制是很不错的。

Update : I found this library very useful for measuring similarity between two words. Also the ConceptNet similarity mechanism is very good.

这个库衡量句子之间的语义相似

and this library for measuring semantic similarity between sentences

如果任何人有任何见解,请分享一下。

If anyone has any insights please share.

推荐答案

这是一个非常复杂的问题。

This is a very complicated problem.

主要技术,我能想到的(之前进入更复杂的自然语言处理过程)将申请余弦(或任何其他指标)的相似性,以每对短语。显然,这解决方案将是非常低效的,此刻由于不匹配的问题:该句子可能是指同一概念,不同的词

The main technique that I can think of (before going into more complicated NLP processes) would be to apply cosine (or any other metric) similarity to each pair of phrases. Obviously this solution would be very inefficient at the moment due to the non-matching problem: The sentences might refer to the same concept with different words.

要解决这个问题,你应该改变每个词组的初始重新presentation有更多的概念性的意思。一种选择是将每个单词扩展其同义词(即使用 WordNet的,另一种选择是申请指标,如分布式语义DS( http://liawww.epfl.ch/Publications/Archive/Besanconetal2001.pdf )扩展每一项与更可能的话再presentation出现吧。

To solve this issue, you should transform the initial representation of each phrase with a more "conceptual" meaning. One option would be to extend each word with its synonyms (i.e. using WordNet, another option is to apply metrics such as distributional semantics DS (http://liawww.epfl.ch/Publications/Archive/Besanconetal2001.pdf) that extend the representation of each term with the more likely words to appear with it.

例: 文档的再presentation:{汽车,种族}将转变为{汽车,汽车,种族}用同义词。同时,随着DS将是这样的:{车,轮,路,先行先试,...}

Example: A representation of a document: {"car","race"} would be transform to {"car","automobile","race"} with synonyms. While, with DS it would be something like: {"car","wheel","road","pilot", ...}

显然,这种转变将不会是二进制的。每个学期都会有一些相关的权重。

Obviously this transformation won't be binary. Each term will have some associated weights.

我希望这有助于。

这篇关于两个短语之间的测量语义相似的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆