使用parsedatetime从输入字符串获取时间结构后,如何将字符串的其余部分切出? [英] After using parsedatetime to get a time structure from the input string, how does one slice the rest of the string out?
问题描述
我想知道如何为Python使用 parsedatetime
返回时间结构和其余输入字符串,而只删除日期/时间输入。
I'm wondering how to use parsedatetime
for Python to return both the timestruct and the rest of the input string with just the date/time input removed.
例如:
import parsedatetime
p = parsedatetime.Calendar()
p.parse("Soccer with @homies at Payne Whitney at 2 pm")
返回值:
time.struct_time(tm_year=2020, tm_mon=1, tm_mday=12, tm_hour=13, tm_min=9, tm_sec=59, tm_wday=6, tm_yday=12, tm_isdst=0), 0)
但是我'd还希望它返回:
but I'd also like it to return:
"Soccer with @homies at Payne Whitney"
是否可以用 parsedatetime
做到这一点,或者是否需要其他Python
Is there a way to do that with parsedatetime
, or would it require a different Python package?
PS
我保证这是一个实际的应用程序,我们正在使用它来构建它: magical.app
I promise this has a practical application, we're using it to build this: magical.app
推荐答案
日历
唯一返回该信息的方法是 nlp()
(我假设代表自然语言处理)。这是一个返回输入所有部分的函数:
The only method of Calendar
that returns that info is nlp()
(which I suppose stands for Natural Language Processing). Here is a function returning all parts of the input:
import parsedatetime
calendar = parsedatetime.Calendar()
def parse(string, source_time = None):
ret = []
parsed_parts = calendar.nlp(string, source_time)
if parsed_parts:
last_stop = 0
for part in parsed_parts:
dt, status, start, stop, segment = part
if start > last_stop:
ret.append((None, 0, string[last_stop:start]))
ret.append((dt, status, segment))
last_stop = stop
if len(string) > last_stop:
ret.append((None, 0, string[last_stop:]))
return ret
for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
print()
print(s)
result = parse(s)
for part in result:
print(part)
输出:
Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 16, 0), 3, 'tomorrow at 2 pm to 4 pm')
(None, 0, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 16, 0), 2, 'at 2 pm to 4 pm')
(None, 0, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 15, 0), 2, 'at 3 pm')
(None, 0, ' to ')
(datetime.datetime(2020, 1, 14, 17, 0), 2, '5 pm')
(None, 0, '!')
状态
告诉您是否关联的日期时间
实际上是日期( 1
),时间( 2
),日期时间( 3
)或都不选择( 0
)。在前两种情况下,缺少的字段取自 source_time
,如果当前时间为 None
,则取自当前时间。
The status
tells you whether the associated datetime
is actually a date (1
), a time (2
), a datetime (3
) or neither (0
). In the first two cases, the missing fields are taken from the source_time
, or from the current time if that is None
.
但是如果仔细检查输出,您会发现这里存在可靠性问题。只能使用第三个解析,在其他两种情况下,信息已丢失。此外,我不知道为什么第二个和第三个字符串将被不同地解析。
But if you examine the output closely, you will see that there is a reliability problem here. Only the third parse can be used, in the other two cases information has been lost. Furthermore, I have no idea why the second and third string would be parsed differently.
另一个库是 dateparser
。它看起来更强大,但是有其自身的问题。 dateparser.parse.search_dates()
函数与您感兴趣的函数接近,但是我无法找出如何判断是否已解析的 datetime
仅传达日期信息,仅传达时间信息或两者。无论如何,这是一个使用 search_dates()
产生类似于上面的输出的函数,但是没有 status
每个部分的
An alternative library is dateparser
. It looks more powerful, but has its own problems. The dateparser.parse.search_dates()
function comes close to what you are interested in, but I haven't been able to find out how to tell whether a parsed datetime
conveys only date information, only time information, or both. Anyway, here is a function that uses search_dates()
to yield an output similar to the above, but without the status
of each part:
from dateparser.search import search_dates
def parse(string: str):
ret = []
parsed_parts = search_dates(string)
if parsed_parts:
last_stop = 0
for part in parsed_parts:
segment, dt = part
start = string.find(segment, last_stop)
stop = start + len(segment)
if start > last_stop:
ret.append((None, string[last_stop:start]))
ret.append((dt, segment))
last_stop = stop
if len(string) > last_stop:
ret.append((None, string[last_stop:]))
return ret
for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
print()
print(s)
result = parse(s)
for part in result:
print(part)
输出:
Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 14, 0), 'tomorrow at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 726130), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 14, 0), 'at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 784468), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 15, 0), 'at 3 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 17, 0), '5 pm')
(None, '!')
我认为在输入中搜索子字符串是可以接受的,并且解析似乎更可预测,但是不知道如何解释每个 datetime
是一个问题。
I think that searching for the substring in the input is acceptable, and the parsing seems more predictable, but not knowing how to interpret each datetime
is a problem.
这篇关于使用parsedatetime从输入字符串获取时间结构后,如何将字符串的其余部分切出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!