从Java中的文本中提取日期 [英] Extracting dates from text in Java
问题描述
是否可以从Java中的字符串中提取日期?
Is it possible to extract dates from a string in Java?
我有500多个带有不同数据的字符串.在其中可以有:
"...时期从08.23.2011-09.05.2011 ..."
并且:
"...期于2011年9月9日结束...".
I have 500+ string with different data. In them, there can be:
"... period from 08.23.2011 - 09.05.2011..."
and also:
"...period ends 06.09.2011...".
不确定上面的字符串是否存在,但是可以.
It's not certain that the above string are there, but they can be.
是否可以提取3个日期并以日期格式获取它们?
Is it possible to extract the 3 dates and get them in Date format?
推荐答案
从本质上讲,正则表达式是公认的答案,但是有很多很多方式来表示日期和时间段,因此如果如果您想要一个好的解决方案,则可能要使用一组经过良好调整的正则表达式.然后是第二个解释阶段,它需要比JodaTime开箱即用解析的灵活性更大的灵活性.因此,对于强大的解决方案,您可能希望使用自然语言处理社区中构建的系统之一,例如 SUTime , HeidelTime 或
In essence regex is the answer for recognition, but there are lots and lots of ways to express dates and time periods, so if you want a good solution, you probably want to use an existing well-tuned set of regex. There's then a second phase of interpretation, which needs more flexibility than what JodaTime will parse out of the box. So for a robust solution, you probably want to use one of the systems that have been built in the natural language processing community, such as SUTime, HeidelTime or GUTime.
这篇关于从Java中的文本中提取日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!