匹配词性标签和单词序列 [英] match POS tag and sequence of words

查看:44
本文介绍了匹配词性标签和单词序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个带有 POS 标签的字符串:

I have the following two strings with their POS tags:

Sent1:像专业作家或用语的工作方式这样的东西真的很酷."

[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer','NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works','NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool','JJ'), ('.', '.')]

[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer', 'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool', 'JJ'), ('.', '.')]

Sent2:语法编辑器等更多选项会很好"

[('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'),('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'),('nice', 'JJ')]

[('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'), ('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ')]

我正在寻找一种方法来检测(返回True)是否存在以下序列:would"+ be"+形容词(无论形容词的位置如何,只要它在would"be"之后)在这些字符串中.在第二个字符串中,形容词nice"紧跟在would be"之后,但在第一个字符串中并非如此.

I am looking for a way to detect (return True) if there is the sequence: "would" + be" + adjective (regardless of the position of the adjective, as long as its after "would" "be") in these strings. In the second string the adjective, "nice" immediately follows "would be" but that is not the case in the first string.

琐碎的情况(形容词前没有其他词;会很好")在我之前的一个问题中得到了解决:检测 POS 标签模式和指定的单词

The trivial case (no other word before the adjective; "would be nice") was solved in an earlier question of mine: detecting POS tag pattern along with specified words

我现在正在寻找一个更通用的解决方案,其中可选词可能出现在形容词之前.我是 NLTK 和 Python 的新手.

I am now looking for a more general solution where optional words may occur before the adjective. I am new to NLTK and Python.

推荐答案

首先按照说明安装nltk_cli:https://github.com/alvations/nltk_cli

那么,这里有一个nltk_cli的秘密函数,也许你会发现它很有用:

Then, here's a secret function in nltk_cli, maybe you'll find it useful:

alvas@ubi:~/git/nltk_cli$ cat infile.txt 
something like how writer pro or phraseology works would be really cool .
more options like the syntax editor would be nice
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt 
would be    really cool
would be    nice

举例说明其他可能的用法:

To illustrate other possible usage:

alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt 
!!! NO CHUNK of VP+VP in this sentence !!!
!!! NO CHUNK of VP+VP in this sentence !!!
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt 
how writer pro or phraseology works would be
the syntax editor   would be
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt 
!!! NO CHUNK of VP+NP in this sentence !!!
!!! NO CHUNK of VP+NP in this sentence !!!

然后如果你想检查句子中的短语是否输出真/假,只需读取并迭代nltk_cli的输出并检查if-else条件.

Then if you want to check if the phrase in sentence and output True/False, simply read and iterate through the outputs from nltk_cli and check with if-else conditions.

这篇关于匹配词性标签和单词序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆