使用parsedatetime从输入字符串获取时间结构后,如何将字符串的其余部分切出? [英] After using parsedatetime to get a time structure from the input string, how does one slice the rest of the string out?

查看:74
本文介绍了使用parsedatetime从输入字符串获取时间结构后,如何将字符串的其余部分切出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何为Python使用 parsedatetime 返回时间结构和其余输入字符串,而只删除日期/时间输入。

I'm wondering how to use parsedatetime for Python to return both the timestruct and the rest of the input string with just the date/time input removed.

例如:

import parsedatetime
p = parsedatetime.Calendar()
p.parse("Soccer with @homies at Payne Whitney at 2 pm")

返回值:

time.struct_time(tm_year=2020, tm_mon=1, tm_mday=12, tm_hour=13, tm_min=9, tm_sec=59, tm_wday=6, tm_yday=12, tm_isdst=0), 0)

但是我'd还希望它返回:

but I'd also like it to return:

"Soccer with @homies at Payne Whitney"

是否可以用 parsedatetime 做到这一点,或者是否需要其他Python

Is there a way to do that with parsedatetime, or would it require a different Python package?

PS

我保证这是一个实际的应用程序,我们正在使用它来构建它: magical.app

I promise this has a practical application, we're using it to build this: magical.app

推荐答案

日历唯一返回该信息的方法是 nlp() (我假设代表自然语言处理)。这是一个返回输入所有部分的函数:

The only method of Calendar that returns that info is nlp() (which I suppose stands for Natural Language Processing). Here is a function returning all parts of the input:

import parsedatetime

calendar = parsedatetime.Calendar()

def parse(string, source_time = None):
    ret = []
    parsed_parts = calendar.nlp(string, source_time)
    if parsed_parts:
        last_stop = 0
        for part in parsed_parts:
            dt, status, start, stop, segment = part
            if start > last_stop:
                ret.append((None, 0, string[last_stop:start]))
            ret.append((dt, status, segment))
            last_stop = stop
        if len(string) > last_stop:
            ret.append((None, 0, string[last_stop:]))
    return ret

for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
    print()
    print(s)
    result = parse(s)
    for part in result:
        print(part)

输出:

Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 16, 0), 3, 'tomorrow at 2 pm to 4 pm')
(None, 0, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 16, 0), 2, 'at 2 pm to 4 pm')
(None, 0, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 15, 0), 2, 'at 3 pm')
(None, 0, ' to ')
(datetime.datetime(2020, 1, 14, 17, 0), 2, '5 pm')
(None, 0, '!')

状态告诉您是否关联的日期时间实际上是日期( 1 ),时间( 2 ),日期时间( 3 )或都不选择( 0 )。在前两种情况下,缺少的字段取自 source_time ,如果当前时间为 None ,则取自当前时间。

The status tells you whether the associated datetime is actually a date (1), a time (2), a datetime (3) or neither (0). In the first two cases, the missing fields are taken from the source_time, or from the current time if that is None.

但是如果仔细检查输出,您会发现这里存在可靠性问题。只能使用第三个解析,在其他两种情况下,信息已丢失。此外,我不知道为什么第二个和第三个字符串将被不同地解析。

But if you examine the output closely, you will see that there is a reliability problem here. Only the third parse can be used, in the other two cases information has been lost. Furthermore, I have no idea why the second and third string would be parsed differently.

另一个库是 dateparser 。它看起来更强大,但是有其自身的问题。 dateparser.parse.search_dates()函数与您感兴趣的函数接近,但是我无法找出如何判断是否已解析的 datetime 仅传达日期信息,仅传达时间信息或两者。无论如何,这是一个使用 search_dates()产生类似于上面的输出的函数,但是没有 status 每个部分的

An alternative library is dateparser. It looks more powerful, but has its own problems. The dateparser.parse.search_dates() function comes close to what you are interested in, but I haven't been able to find out how to tell whether a parsed datetime conveys only date information, only time information, or both. Anyway, here is a function that uses search_dates() to yield an output similar to the above, but without the status of each part:

from dateparser.search import search_dates

def parse(string: str):
    ret = []
    parsed_parts = search_dates(string)
    if parsed_parts:
        last_stop = 0
        for part in parsed_parts:
            segment, dt = part
            start = string.find(segment, last_stop)
            stop = start + len(segment)
            if start > last_stop:
                ret.append((None, string[last_stop:start]))
            ret.append((dt, segment))
            last_stop = stop
        if len(string) > last_stop:
            ret.append((None, string[last_stop:]))
    return ret


for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
    print()
    print(s)
    result = parse(s)
    for part in result:
        print(part)

输出:

Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 14, 0), 'tomorrow at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 726130), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 14, 0), 'at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')

Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 784468), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 15, 0), 'at 3 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 17, 0), '5 pm')
(None, '!')

我认为在输入中搜索子字符串是可以接受的,并且解析似乎更可预测,但是不知道如何解释每个 datetime 是一个问题。

I think that searching for the substring in the input is acceptable, and the parsing seems more predictable, but not knowing how to interpret each datetime is a problem.

这篇关于使用parsedatetime从输入字符串获取时间结构后,如何将字符串的其余部分切出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆