匹配POS标签和单词序列 [英] match POS tag and sequence of words

查看:105
本文介绍了匹配POS标签和单词序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个带有POS标签的字符串:

I have the following two strings with their POS tags:

已发送1 :"诸如作家专业版或词组学的工作原理真的很酷."

[('something','NN'),('like','IN'),('how','WRB'),('writer', 'NN'),('pro','NN'),('or','CC'),('短语','NN'),('作品', 'NNS'),('would','MD'),('be','VB'),('really','RB'),('cool', 'JJ'),('.','.')]

[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer', 'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool', 'JJ'), ('.', '.')]

Sent2 :"像语法编辑器这样的更多选项会很好"

[('more','JJR'),('options','NNS'),('like','IN'),('the','DT'), ('syntax','NN'),('editor','NN'),('would','MD'),('be','VB'), ("nice","JJ")]

[('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'), ('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ')]

我正在寻找一种方法来检测(返回True)是否存在以下序列:"would" + be" +形容词(无论形容词的位置如何,只要其位于"would""be"之后)在第二个字符串中,形容词"nice"紧跟在"would be"之后,但在第一个字符串中不是这种情况.

I am looking for a way to detect (return True) if there is the sequence: "would" + be" + adjective (regardless of the position of the adjective, as long as its after "would" "be") in these strings. In the second string the adjective, "nice" immediately follows "would be" but that is not the case in the first string.

琐碎的情况(形容词前没有其他词; 会很好")在我的一个较早的问题中得到了解决:

The trivial case (no other word before the adjective; "would be nice") was solved in an earlier question of mine: detecting POS tag pattern along with specified words

我现在正在寻找一种更通用的解决方案,其中可能在形容词之前出现可选单词.我是NLTK和Python的新手.

I am now looking for a more general solution where optional words may occur before the adjective. I am new to NLTK and Python.

推荐答案

首先按照说明安装nltk_cli: https://github.com/alvations/nltk_cli

然后,这是nltk_cli中的一个秘密函数,也许您会发现它很有用:

Then, here's a secret function in nltk_cli, maybe you'll find it useful:

alvas@ubi:~/git/nltk_cli$ cat infile.txt 
something like how writer pro or phraseology works would be really cool .
more options like the syntax editor would be nice
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt 
would be    really cool
would be    nice

为说明其他可能的用法:

To illustrate other possible usage:

alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt 
!!! NO CHUNK of VP+VP in this sentence !!!
!!! NO CHUNK of VP+VP in this sentence !!!
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt 
how writer pro or phraseology works would be
the syntax editor   would be
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt 
!!! NO CHUNK of VP+NP in this sentence !!!
!!! NO CHUNK of VP+NP in this sentence !!!

然后,如果您要检查句子中的短语并输出True/False,只需阅读并遍历nltk_cli的输出并使用if-else条件进行检查.

Then if you want to check if the phrase in sentence and output True/False, simply read and iterate through the outputs from nltk_cli and check with if-else conditions.

这篇关于匹配POS标签和单词序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆