使用NLTK识别字符串中的日期 [英] Identifying dates in strings using NLTK

查看:85
本文介绍了使用NLTK识别字符串中的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试确定日期是否出现在任意字符串中.这是我的代码:

I'm trying to identify whether a date occurs in an arbitrary string. Here's my code:

import nltk

txts = ['Submitted on 1st January',
'Today is 1/3/15']

def chunk(t):
    w_tokens = nltk.word_tokenize(t)
    pt = nltk.pos_tag(w_tokens)
    ne = nltk.ne_chunk(pt)
    print ne

for t in txts:
    print t
    chunk(t)

我得到的输出是

Submitted on 1st January
(S (GPE Submitted/NNP) on/IN 1st/CD January/NNP)
Today is 1/3/15
(S Today/NN is/VBZ 1/3/15/CD)

显然,日期没有被标记.有人知道如何标记日期吗?

Clearly the dates are not being tagged. Does anyone know how to have dates tagged?

谢谢

推荐答案

我从您的注释 1/1/70 中获取了日期示例,但是如果它们的格式不同,则此正则表达式代码也会找到它们例如1970/01/20或2-21-79

I took the date example from your comment 1/1/70 but this regex code will also find them if they are formatted differently like 1970/01/20 or 2-21-79

import re
x = 'asdfasdf sdf5sdf asd78fsadf 1/1/70 dfsdg fghdfgh 1970/01/20 gfh5fghh sdfgsdg 2-21-79 sdfgsdgf'
print re.findall(r'\d+\S\d+\S\d+', x)

输出:

['1/1/70', '1970/01/20', '2-21-79']

OR

y = 'Asdfasdf Ddf5sdf asd78fsadf Jan 3 dfsdg fghdfgh February 10 sdfgsdgf'
print re.findall(r'[A-Z]\w+\s\d+', y) 

输出:

['Jan 3', 'February 10']

这篇关于使用NLTK识别字符串中的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆