如何从文本中提取动词和所有相应的副词? [英] How to extract the verbs and all corresponding adverbs from a text?
问题描述
在Python中使用ngram的目的是从输入文本中找出动词及其对应的副词. 我所做的事情:
Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done:
输入文字:"他说话很奇怪.一匹马跑得很快.那里有一棵大树.太阳很美.这个地方装饰得很好,他们说话很奇怪.她跑得快.她说话很好.杰克跑慢了." 代码:-
Input text:""He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."" Code:-
`finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',))
scored = finder2.score_ngrams(bigram_measures.raw_freq)
print sorted(finder2.nbest(bigram_measures.raw_freq, 5))`
从我的代码中,我得到了输出:
[('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')]
这是动词及其对应的副词的列表.
From my code, I got the output:
[('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')]
which is the list of verbs and their corresponding adverbs.
我要寻找的东西
我想从中找出动词和所有相应的副词.例如('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.
I want to figure out verb and all corresponding adverbs from this. For example ('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.
推荐答案
您已经有了所有动词-副词二元组的列表,因此您只是在问如何将它们合并成提供 all 的字典 em>每个动词的副词.但是首先让我们以一种更直接的方式重新创建您的二元组:
You already have a list of all verb-adverb bigrams, so you're just asking how to consolidate them into a dictionary that gives all adverbs for each verb. But first let's re-create your bigrams in a more direct way:
pairs = list()
for (w1, tag1), (w2, tag2) in nltk.bigrams(posTagged):
if t1.startswith("VB") and t2 == "RB":
pairs.append((w1, w2))
现在您的问题是:我们将建立一个词典,其中包含每个动词后的副词.我将副词存储在一个集合中,而不是一个列表中,以消除重复.
Now for your question: We'll build a dictionary with the adverbs that follow each verb. I'll store the adverbs in a set, not a list, to get rid of duplications.
from collections import defaultdict
consolidated = defaultdict(set)
for verb, adverb in pairs:
consolidated[verb].add(adverb)
defaultdict
为以前没有出现过的动词提供了一个空集,因此我们不需要手工检查.
The defaultdict
provides an empty set for verbs that haven't been seen before, so we don't need to check by hand.
根据作业的详细信息,您可能还希望对动词进行大小写折叠和去词组化,以便将鲁D驾驶"和我小心驾驶"中的副词记录在一起:
Depending on the details of your assignment, you might also want to case-fold and lemmatize your verbs so that the adverbs from "Driving recklessly" and "I drove carefully" are recorded together:
wnl = nltk.stem.WordNetLemmatizer()
...
for verb, adverb in pairs:
verb = wnl.lemmatize(verb.lower(), "v")
consolidated[verb].add(adverb)
这篇关于如何从文本中提取动词和所有相应的副词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!