在NLTK解析器中使用整数/日期作为终端 [英] Using integers/dates as terminals in NLTK parser

查看：84 发布时间：2020/5/18 1:18:29 python regex parsing nltk earley-parser

本文介绍了在NLTK解析器中使用整数/日期作为终端的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用NLTK中的Earley解析器来解析诸如以下的句子:

I'm trying to use the Earley parser in NLTK to parse sentences such as:

如果日期在2010年12月21日之前，则序列号= 10

为此，我正在尝试编写CFG，但问题是我需要将日期和整数的通用格式作为终端，而不是特定值. 是否有任何方法可以将生产规则的右侧指定为正则表达式，从而可以进行这种处理?

To do this, I'm trying to write a CFG but the problem is I would need to have a general format of dates and integers as terminals, instead of the specific values. Is there any ways to specify the right hand side of a production rule as a regular expression, which would allow this kind of processing?

类似的东西:

S -> '[0-9]+'

可以处理所有整数.

推荐答案

为此，您需要对日期进行标记化，以便每个数字和斜杠是一个单独的标记.

For this to work, you'll need to tokenize the date so that each digit and slash is a separate token.

from nltk.parse.earleychart import EarleyChartParser
import nltk

grammar = nltk.parse_cfg("""
DATE -> MONTH SEP DAY SEP YEAR
SEP -> "/"
MONTH -> DIGIT | DIGIT DIGIT
DAY -> DIGIT | DIGIT DIGIT
YEAR -> DIGIT DIGIT DIGIT DIGIT
DIGIT -> '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
""")

parser = EarleyChartParser(grammar)
print parser.parse(["1", "/", "1", "0", "/", "1", "9", "8", "7"])

输出为:

(DATE
  (MONTH (DIGIT 1))
  (SEP /)
  (DAY (DIGIT 1) (DIGIT 0))
  (SEP /)
  (YEAR (DIGIT 1) (DIGIT 9) (DIGIT 8) (DIGIT 7)))

这还以允许日期和月份为一位数字的形式提供了一定的灵活性.

This also affords some flexibility in the form of allowing dates and months to be single-digit.

这篇关于在NLTK解析器中使用整数/日期作为终端的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在NLTK解析器中使用整数/日期作为终端 [英] Using integers/dates as terminals in NLTK parser

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在NLTK解析器中使用整数/日期作为终端 [英] Using integers/dates as terminals in NLTK parser

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭