匹配POS标签和单词序列 [英] match POS tag and sequence of words
问题描述
我有以下两个带有POS标签的字符串:
I have the following two strings with their POS tags:
已发送1 :"诸如作家专业版或词组学的工作原理真的很酷."
[('something','NN'),('like','IN'),('how','WRB'),('writer', 'NN'),('pro','NN'),('or','CC'),('短语','NN'),('作品', 'NNS'),('would','MD'),('be','VB'),('really','RB'),('cool', 'JJ'),('.','.')]
[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer', 'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool', 'JJ'), ('.', '.')]
Sent2 :"像语法编辑器这样的更多选项会很好"
[('more','JJR'),('options','NNS'),('like','IN'),('the','DT'), ('syntax','NN'),('editor','NN'),('would','MD'),('be','VB'), ("nice","JJ")]
[('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'), ('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ')]
我正在寻找一种方法来检测(返回True)是否存在以下序列:"would" + be" +形容词(无论形容词的位置如何,只要其位于"would""be"之后)在第二个字符串中,形容词"nice"紧跟在"would be"之后,但在第一个字符串中不是这种情况.
I am looking for a way to detect (return True) if there is the sequence: "would" + be" + adjective (regardless of the position of the adjective, as long as its after "would" "be") in these strings. In the second string the adjective, "nice" immediately follows "would be" but that is not the case in the first string.
琐碎的情况(形容词前没有其他词; 会很好")在我的一个较早的问题中得到了解决:
The trivial case (no other word before the adjective; "would be nice") was solved in an earlier question of mine: detecting POS tag pattern along with specified words
我现在正在寻找一种更通用的解决方案,其中可能在形容词之前出现可选单词.我是NLTK和Python的新手.
I am now looking for a more general solution where optional words may occur before the adjective. I am new to NLTK and Python.
推荐答案
首先按照说明安装nltk_cli
: https://github.com/alvations/nltk_cli
然后,这是nltk_cli
中的一个秘密函数,也许您会发现它很有用:
Then, here's a secret function in nltk_cli
, maybe you'll find it useful:
alvas@ubi:~/git/nltk_cli$ cat infile.txt
something like how writer pro or phraseology works would be really cool .
more options like the syntax editor would be nice
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt
would be really cool
would be nice
为说明其他可能的用法:
To illustrate other possible usage:
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt
!!! NO CHUNK of VP+VP in this sentence !!!
!!! NO CHUNK of VP+VP in this sentence !!!
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt
how writer pro or phraseology works would be
the syntax editor would be
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt
!!! NO CHUNK of VP+NP in this sentence !!!
!!! NO CHUNK of VP+NP in this sentence !!!
然后,如果您要检查句子中的短语并输出True/False,只需阅读并遍历nltk_cli
的输出并使用if-else
条件进行检查.
Then if you want to check if the phrase in sentence and output True/False, simply read and iterate through the outputs from nltk_cli
and check with if-else
conditions.
这篇关于匹配POS标签和单词序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!