如何使用NLTK在python中获取句子列表的通用标签模式 [英] How to get common tag pattern for sentences list in python with NLTK

查看:107
本文介绍了如何使用NLTK在python中获取句子列表的通用标签模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有一个句子列表.使用NLTK,我可以标记句子并获得该句子的标记模式.这样就可以得到整个列表的标签模式,但是我想要的是找出大多数句子都可以匹配的常见标签模式,例如:

Here I have a list of sentences.With NLTK I can tag the sentence and get the tag pattern of that sentences. So as like this I can get the tag patterns for the whole list.But what I wanted is to identify the common tag patterns which most sentences get matched.For example:

  • 什么是封装

  • What is encapsulation

tag pattern : {<WP><VBZ><NN>}

  • 你的婚礼怎么样

  • How was your wedding

    tag pattern : {<WRB><VBD><PRP$><NN>}
    

  • 您今天的计划是什么

  • What is your plan today

    tag pattern : {<WP><VBZ><PRP$><NN><NN>}
    

  • 因此上述三句话的常用标记模式(结合正则表达式标记器)是:

    So the common tag pattern(Combining regexp tagger) for above threes sentences is:

    {<W.+><V.+><PRP.?>?<NN>+} - One "Wh" word,one verb,zero or one pronoun,one or many nouns
    

    所以我想将句子的标记模式概括为普通的标记模式.这就是我想做的事情.

    So I want to generalize the tag patterns of sentences to common ones.This is the thing what I wanted to do..

    那么有人可以告诉我该怎么做吗?

    So can someone tell me how to do that?

    推荐答案

    听起来您正在使用正则表达式(带有量词),该正则表达式将匹配数据中的所有不同标记序列.虽然这不是一个简单的问题, 我怀疑您的目标是找到一种模式来捕获合法句子的序列,对吗?

    It sounds like you are after a regexp (with quantifiers) that will match all the different tag sequences in your data. While this is not an easy problem, I suspect that your goal is to find a pattern that captures the sequences that are legal sentences, is this right?

    如果是这样,则正则表达式(通常是有限状态方法)天生就是用于此工作的错误工具.为了甚至开始刻画句子集合的特征,您需要查看上下文无关的语法.看一下有关该主题的NLTK资料.

    If so, regexps (and finite-state approaches in general) are inherently the wrong tool for the job. To even get a start on characterizing your sentence collection, you need to look at context-free grammars. Take a look at the NLTK's materials on the topic.

    这篇关于如何使用NLTK在python中获取句子列表的通用标签模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆