使用WordNet确定两个文本之间的语义相似性? [英] Using WordNet to determine semantic similarity between two texts?

查看:115
本文介绍了使用WordNet确定两个文本之间的语义相似性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用WordNet确定python中两个文本之间的语义相似性?

How can you determine the semantic similarity between two texts in python using WordNet?

明显的预处理是删除停用词并阻止词干,但是那又是什么呢?

The obvious preproccessing would be removing stop words and stemming, but then what?

我能想到的唯一方法是计算两个文本中每个单词之间的WordNet路径距离.这是字母组合的标准.但是这些都是大的(400个单词)文本,它们是自然语言文档,其单词没有任何特定的顺序或结构(英语语法强加的单词除外).那么,您将在文本之间比较哪些词?您将如何在python中执行此操作?

The only way I can think of would be to calculate the WordNet path distance between each word in the two texts. This is standard for unigrams. But these are large (400 word) texts, that are natural language documents, with words that are not in any particular order or structure (other than those imposed by English grammar). So, which words would you compare between texts? How would you do this in python?

推荐答案

您可以做的一件事是:

  1. 杀死停用词
  2. 在同一个文档中,找到尽可能多的同义词和反义词与其他单词具有最大交集的单词.我们称这些为重要单词"
  3. 检查每个文档的重要单词集是否相同.它们在一起的距离越近,文档在语义上就越相似.

还有另一种方式.从每个文档的句子中计算句子树.然后比较两个森林.很久以前,我在一门课程上做了一些类似的工作. 这里是代码(请记住,这是很久以前的事情了,它是上课的.因此,至少可以说,该代码非常hacky.

There is another way. Compute sentence trees out of the sentences in each doc. Then compare the two forests. I did some similar work for a course a long time ago. Here's the code (keep in mind this was a long time ago and it was for class. So the code is extremely hacky, to say the least).

希望这会有所帮助

这篇关于使用WordNet确定两个文本之间的语义相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆