NLP中的求反处理 [英] Negation handling in NLP

查看:107
本文介绍了NLP中的求反处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在一个项目中,我想从文本中提取情感.但是,当我使用conceptnet5(语义网络)时,我不能简单地在包含否定词的句子中添加单词前缀,因为这些单词根本不会出现在conceptnet5的API中.

这是一个例子:

电影不是很好.

因此,我认为我可以使用wordnet的引理功能来替换包含否定词(例如(不,...))的句子中的形容词.

在前面的示例中,该算法将检测到wasn't,并将其替换为was not. 此外,它将检测到否定词not,并用其反义词bad替换good. 该句子将显示为:

这部电影真糟糕.

虽然我看到这不是最优雅的方法,并且在很多情况下确实会产生错误的结果,但我仍然想以这种方式处理否定,因为坦率地说,我不知道任何更好的方法. /p>

考虑我的问题: 不幸的是,我没有找到任何库可以替换所有出现的否定词(wasn't => was not). 我的意思是,我可以通过用正则表达式替换出现的内容来手动完成此操作,但随后我会被英语困住.

因此,我想问你们中的一些人是否知道可以在这里为我提供帮助的库,函数或更好的方法. 目前,我正在使用python nltk,但似乎它仍未包含此类功能,但我可能是错的.

先谢谢您了:)

解决方案

可以简单地通过标记化(tokens = nltk.word_tokenize(sentence))来解析类似wasn't的情况:wasn't将变成wasn't.

但是,否定含义也可以由准否定词,例如几乎,几乎,很少"和隐含否定词,例如失败,阻止,勉强,拒绝,缺席"构成,请查看关于否定 .

考虑到您的最初问题,情感分析,据我所知,大多数现代方法都不会明确处理否定;相反,他们使用带有高阶n-gram的监督方法.那些实际处理否定词的人通常会在否定词和标点符号之间的所有单词后面加上特殊的前缀NOT _.

I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API.

Here's an example:

The movie wasn't that good.

Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...).

In the previous example, the algorithm would detect wasn't and would replace it with was not. Further, it would detect a negation-word not, and replace good with it's antonym bad. The sentence would read:

The movie was that bad.

While I see that this isn't the most elegant way, and it does probably in many cases produce the wrong result, I'd still like to handle negation that way as I frankly don't know any better approach.

Considering my problem: Unfortunately, I did not find any library that would allow me to replace all occurrences of appended negation-words (wasn't => was not). I mean I could do it manually, by replacing the occurrences with a regex, but then I would be stuck with the english language.

Therefore I'd like to ask if some of you know a library, function or better method that could help me here. Currently I'm using python nltk, still it doesn't seem that it contains such functionality, but I may be wrong.

Thanks in advance :)

解决方案

Cases like wasn't can be simply parsed by tokenization (tokens = nltk.word_tokenize(sentence)): wasn't will turn into was and n't.

But negative meaning can also be formed by 'Quasi negative words, like hardly, barely, seldom' and 'Implied negatives, such as fail, prevent, reluctant, deny, absent', look into this paper. Even more detailed analysis can be found in Christopher Potts' On the negativity of negation .

Considering your initial problem, sentiment analysis, most modern approaches, as far as I know, don't process negations explicitly; instead, they use supervised approaches with high-order n-grams. Those actually processing negation usually append special prefix NOT_ to all words between negation and punctuation marks.

这篇关于NLP中的求反处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆