如何从文本中检索各种日期和时间值 [英] How to retrieve all kinds of dates and temporal values from text
问题描述
我想从一组字符串中检索日期和其他时间实体。这可以在没有解析JAVA中的日期的字符串的情况下完成,因为大多数解析器处理有限的输入模式范围。但是输入是一个手动输入,在这里,因此是模糊的。
输入可以像:
9月12日| 3月中旬| 12 .2013年9月12日| 9月12日| 2013年
9月13日| 2013年9月12日| 12日,12月2日
我在Java中找到了很多答案,但大部分都没有处理如此巨大的输入模式范围。
我已经尝试使用 SimpleDateFormat
类,并使用一些parse()函数来检查解析函数是否中断,这意味着它不是一个日期。我试过使用 regex
,但我不知道在这种情况下是否适合。我还使用 ClearNLP 来注释日期,但不会给出可靠的注释集。
如下所述,获取这些值的最接近的方法可能是使用责任链
。有一个图书馆有一套日期的图案。我可以用吗?
是的!我终于提取了各种日期/时间值,可以像以下那样通用:
3月中旬|上个月| 9/11
具体如下:
11/11/11 11:11:11
我在JAPE中创建了一个更宽松的注释规则,说DateEnhanced包含某些类型的日期,如9 / 11或11日,2001年2月,并在RHS上使用了一个链接Java正则表达式的DateEnhanced注释 JAPE RULE
,以过滤一些不需要的输出。
I wanted to retrieve dates and other temporal entities from a set of Strings. Can this be done without parsing the string for dates in JAVA as most parsers deal with a limited scope of input patterns. But input is a manual entry which here and hence ambiguous.
Inputs can be like:
12th Sep |mid-March |12.September.2013
Sep 12th |12th September| 2013
Sept 13 |12th, September |12th, Feb, 2013
I've gone through many answers on finding date in Java but most of them don't deal with such a huge scope of input patterns.
I've tried using SimpleDateFormat
class and using some parse() functions to check if parse function breaks which mean its not a date. I've tried using regex
but I'm not sure if it falls fit in this scenario. I've also used ClearNLP to annotate the dates but it doesn't give a reliable annotation set.
The closest approach to getting these values could be using a Chain of responsibility
as mentioned below. Is there a library that has a set of patterns for date. I can use that maybe?
Yes! I've finally extracted all sorts of dates/temporal values that can be as generic as :
mid-March | Last Month | 9/11
To as specific as:
11/11/11 11:11:11
This finally happened because of awesome libraries from GATE and JAPE
I've created a more lenient annotation rule in JAPE say 'DateEnhanced' to include certain kinds of dates like "9/11 or 11TH, February- 2001" and used a Chaining of Java regex on R.H.S. of the 'DateEnhanced' annotations JAPE RULE
, to filter some unwanted outputs.
这篇关于如何从文本中检索各种日期和时间值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!