句子之间的语义相似度 [英] semantic similarity between sentences

查看:297
本文介绍了句子之间的语义相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做project.i需要任何开源工具或技术来找到句子之间的语义相似性,我将输入作为两个句子输出并输出为分数(即语义相似性)。任何人都知道这些信息。我希望我很快就会得到回复。谢谢大家。

i am doing project.i need any opensource tool or technique to find the semantic similarity between sentences where i give input as two sentences and output as score (i.e.,semantic similarity).can any one know this information.i hope i will get reply soon.thank you all.

推荐答案

萨尔玛,我担心这不是你问题的正确论坛它与编程没有直接关系。我建议您再次在语料库列表中提出您的问题。你也可能想先搜索他们的档案。

Salma, I'm afraid this is not the right forum for your question as it's not directly related to programming. I recommend that you ask your question again on corpora list. You also may want to search their archives first.

除此之外,你的问题还不够精确,我会解释我的意思。我假设你的项目是关于计算句子之间的语义相似性而不是关于语义相似性只是其中之一的其他东西。如果是这种情况,那么有几点需要考虑:首先,从计算语言学和理论语言学的角度来看,语义相似性一词的含义都不清楚。它有许多不同的观点和定义,都取决于要解决的问题的类型,手头的工具和技术,以及接近这个任务的人的背景等。考虑这些例子:

Apart from that, your question is not precise enough, and I'll explain what I mean by that. I assume that your project is about computing the semantic similarity between sentences and not about something else to which semantic similarity is just one thing among many. If this is the case, then there are a few things to consider: First of all, neither from the perspective of computational linguistics nor of theoretical linguistics is it clear what the term 'semantic similarity' means exactly. There are numerous different views and definitions of it, all depending on the type of problem to be solved, the tools and techniques which are at hand, and the background of the one approaching this task, etc. Consider these examples:


  1. Pete和Rob在车站附近发现了一只狗。

  2. Pete和Rob从来没有在附近找到过一只狗

  3. Pete和Rob都喜欢编程。

  4. Patricia在车站附近发现了一只狗。

  5. 在雪下发现Pete和Rob的是一只狗。

  1. Pete and Rob have found a dog near the station.
  2. Pete and Rob have never found a dog near the station.
  3. Pete and Rob both like programming a lot.
  4. Patricia found a dog near the station.
  5. It was a dog who found Pete and Rob under the snow.

2-4中的哪一句与1类似? 2与1完全相反,仍然是皮特和罗布(不)找到一只狗。 3是关于皮特和罗布,但在一个完全不同的背景下。 4是关于在车站附近找一只狗,虽然发现者是别人。 5是关于Pete,Rob,狗和发现事件,但方式不同于1.对于我来说,即使不必编写计算机程序,我也无法根据它们的相似性对这些例子进行排名。

Which of the sentences 2-4 are similar to 1? 2 is the exact opposite of 1, still it is about Pete and Rob (not) finding a dog. 3 is about Pete and Rob, but in a completely different context. 4 is about find a dog near the station, although the finder being someone else. 5 is about Pete, Rob, a dog, and a 'finding' event but in a different way than in 1. As for me, I would not be able to rank these examples according to their similarity even without having to write a computer program.

为了计算语义相似性,你需要首先确定你想要被视为语义相似的东西,什么不是。为了计算句子级别的语义相似度,理想情况下,您可以比较句子的某种意义表示。意义表示通常作为逻辑公式出现,并且生成起来非常复杂。但是,有一些工具试图这样做,例如, Boxer

In order to compute semantic similarity you need to first decide what you want to be treated as 'semantically similar' and what not. In order to compute semantic similarity on the sentence level, you ideally would compare some kind of meaning representation of the sentences. Meaning representation normally come as logic formula and are extremely complex to generate. However, there are tools which attempt to do this, e.g. Boxer

作为一种简单但经常实用的方法,您可以将语义相似度定义为一个句子中的单词与另一个句子之间的相似性的总和。这使得问题变得容易很多,尽管仍然存在一些难以解决的问题,因为单词的语义相似性与句子的语义相似性一样严格。如果你想得到这个印象,请看看D.A.的词汇语义学一书。克鲁斯(1986年)。然而,有许多工具和技术来计算单词之间的语义相似性。他们中的一些人基本上将其定义为分类中的两个单词的负距离,如 Word Net 或维基百科分类(参见本文,其中介绍了此API) )。其他人通过使用在大文本语料库上计算的一些统计测量来计算语义相似性。它们基于类似词语出现在类似语境中的洞察力。计算句子单词之间语义相似度的第三种方法涉及从信息检索中可以知道的向量空间模型。要了解后面的这些技巧,请参阅Manning和Schütze的统计自然语言处理基础一书中的第8.5章。

As a simplistic but often practical approach, you would define semantic similarity as the sum of the similarities between the words in one sentence and the other. This makes the problem a lot easier, although there are still some difficult issues to be addressed since semantic similarity of words is just as badly defined as that of sentences. If you want to get an impression of this, take a look into the book 'Lexical Semantics' by D.A. Cruse (1986). However, there are quite a number of tools and techniques to compute the semantic similarity between word. Some of them define it basically as the negative distance of two words in a taxonomy like Word Net or the Wikipedia taxonomy (see this paper which describes an API for this). Others compute semantic similarity by using some statistical measures calculated over large text corpora. They are based on the insight that similar words occur in similar context. A third approach to calculating semantic similarity between sentences or words is concerned with vector space models which you may know from information retrieval. To get an overview about these latter techniques, take a look at chapter 8.5 in the book Foundations of statistical natural language processing by Manning and Schütze.

希望这会让你暂时站起来。

Hope this gets you off on your feet for now.

这篇关于句子之间的语义相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆