检测POS标签模式以及指定的单词 [英] detecting POS tag pattern along with specified words

查看:65
本文介绍了检测POS标签模式以及指定的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在某些指定单词之前/之后识别某些POS标签,例如以下带标记的句子:

I need to identify certain POS tags before/after certain specified words, for example the following tagged sentence:

[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]

可以抽象为将" +形容词

can be abstracted to the form "would be" + Adjective

类似地:

[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

的形式为能够" +动词

is of the form "am able to" + Verb

如何检查句子中的这些类型的模式.我正在使用NLTK.

How can I go about checking for these type of a pattern in sentences. I am using NLTK.

推荐答案

假定您要按字面检查"would",然后是"be",再加上一些形容词,则可以执行以下操作:

Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:

def would_be(tagged):
    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

输入是带有POS标签的句子(根据NLTK,元组列表).

The input is a POS tagged sentence (list of tuples, as per NLTK).

它检查列表中是否有任何三个元素,使得"would"位于"be"旁边,而"be"位于标记为形容词('JJ')的单词旁边.匹配此模式"后,它将立即返回True.

It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True as soon as this "pattern" is matched.

您可以为第二种句子做类似的事情:

You can do something very similar for the second type of sentence:

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

以下是该程序的驱动程序:

Here's a driver for the program:

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

def would_be(tagged):
   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))

这可以正确输出:

Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True

如果要对此进行概括,则可以更改是否要检查原义词或它们的POS标签.

If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.

这篇关于检测POS标签模式以及指定的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆