如何在NLTK CFG中匹配整数? [英] How to match integers in NLTK CFG?

查看:70
本文介绍了如何在NLTK CFG中匹配整数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想定义一种语法,其中一个标记将与一个整数匹配,我如何使用nltk的字符串CFG来实现它?

If I want to define a grammar in which one of the tokens will match an integer, how can i achieve it using nltk's string CFG?

例如-

S -> SK SO FK
SK -> 'SELECT'
SO -> '\d+'
FK -> 'FROM'

推荐答案

这样创建数字短语:

import nltk

groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '10'
""")

sent = 'I shot 3 elephants'.split()
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    print(tree)

[输出]:

(S (NP I) (VP (V shot) (NP (NUM 3) (N elephants))))

但是请注意,这只能处理一位数字.因此,让我们尝试将整数压缩为单个令牌类型,例如'#NUM#':

But note that that can only handle single digit number. So let's try compressing integers into a single token-type, e.g. '#NUM#':

import nltk

groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '#NUM#'
""")

sent = 'I shot 333 elephants'.split()
sent = ['#NUM#' if i.isdigit() else i for i in sent]

parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    print(tree)

[输出]:

(S (NP I) (VP (V shot) (NP (NUM #NUM#) (N elephants))))

要将数字放回去,请尝试:

To put the numbers back, try:

import nltk

groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '#NUM#'
""")

original_sent = 'I shot 333 elephants'.split()
sent = ['#NUM#' if i.isdigit() else i for i in original_sent]
numbers = [i for i in original_sent if i.isdigit()]

parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    treestr = str(tree)
    for n in numbers:
        treestr = treestr.replace('#NUM#', n, 1)
    print(treestr)

[输出]:

(S (NP I) (VP (V shot) (NP (NUM 333) (N elephants))))

这篇关于如何在NLTK CFG中匹配整数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆