如何从文本中提取动词和所有相应的副词? [英] How to extract the verbs and all corresponding adverbs from a text?

查看:174
本文介绍了如何从文本中提取动词和所有相应的副词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中使用ngram的目的是从输入文本中找出动词及其对应的副词. 我所做的事情:

Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done:

输入文字:"他说话很奇怪.一匹马跑得很快.那里有一棵大树.太阳很美.这个地方装饰得很好,他们说话很奇怪.她跑得快.她说话很好.杰克跑慢了." 代码:-

Input text:""He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."" Code:-

`finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',))
scored = finder2.score_ngrams(bigram_measures.raw_freq)
print sorted(finder2.nbest(bigram_measures.raw_freq, 5))`

从我的代码中,我得到了输出: [('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')] 这是动词及其对应的副词的列表.

From my code, I got the output: [('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')] which is the list of verbs and their corresponding adverbs.

我要寻找的东西

我想从中找出动词和所有相应的副词.例如('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.

I want to figure out verb and all corresponding adverbs from this. For example ('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.

推荐答案

您已经有了所有动词-副词二元组的列表,因此您只是在问如何将它们合并成提供 all 的字典 em>每个动词的副词.但是首先让我们以一种更直接的方式重新创建您的二元组:

You already have a list of all verb-adverb bigrams, so you're just asking how to consolidate them into a dictionary that gives all adverbs for each verb. But first let's re-create your bigrams in a more direct way:

pairs = list()
for (w1, tag1), (w2, tag2) in nltk.bigrams(posTagged):
    if t1.startswith("VB") and t2 == "RB":
        pairs.append((w1, w2))

现在您的问题是:我们将建立一个词典,其中包含每个动词后的副词.我将副词存储在一个集合中,而不是一个列表中,以消除重复.

Now for your question: We'll build a dictionary with the adverbs that follow each verb. I'll store the adverbs in a set, not a list, to get rid of duplications.

from collections import defaultdict
consolidated = defaultdict(set)
for verb, adverb in pairs:
    consolidated[verb].add(adverb)

defaultdict为以前没有出现过的动词提供了一个空集,因此我们不需要手工检查.

The defaultdict provides an empty set for verbs that haven't been seen before, so we don't need to check by hand.

根据作业的详细信息,您可能还希望对动词进行大小写折叠和去词组化,以便将鲁D驾驶"和我小心驾驶"中的副词记录在一起:

Depending on the details of your assignment, you might also want to case-fold and lemmatize your verbs so that the adverbs from "Driving recklessly" and "I drove carefully" are recorded together:

wnl = nltk.stem.WordNetLemmatizer()
...
for verb, adverb in pairs:
    verb = wnl.lemmatize(verb.lower(), "v")
    consolidated[verb].add(adverb)

这篇关于如何从文本中提取动词和所有相应的副词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆