正则表达式到日期分割字段并保留 [英] Regex to splitstring on date and keep it
本文介绍了正则表达式到日期分割字段并保留的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个字符串,我想在日期分割:
I have a string that I want to split on the date:
28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato
应该最终作为
which should end up as
28/11/2016 Mushroom
05/12/2016 Carrot
12/12/2016 Broccoli
19/12/2016 Potato
显然日期变化很难。我已经制定了正则表达式,但我不知道如何保持分隔符(日期)。
Obviously the date changes which makes it difficult. I've worked out the regex but I can't figure out how to keep the delimiter (the date) as well.
import re
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
replaced = re.sub(r"\d{2}\/\d{2}\/\d{4}\s*", ",", s) # looses data
print replaced
g = re.match(r"(\d{2}\/\d{2}\/\d{4}\s*)(.*)", s)
if g:
# replaced = s.replace(group(0), "\n" + g.group(0)) # fails
# print replaced
推荐答案
如果总有空格,您可以使用分割方法在日期之间:
You may use a splitting approach if there is always whitespace between the dates:
\s+(?=\d+/\d+/\d+\s)
请参阅正则表达式演示
详细信息
Details:
-
\s +
- 匹配1+空格 -
(?= \d + / \d + / \d + \s)
- 后跟1位数字,/
+一个或多个数字两次(类似日期的样式),然后是空格
\s+
- match 1+ whitespaces(?=\d+/\d+/\d+\s)
- that are followed with 1+ digits, and/
+ one or more digits twice (the date-like pattern), and then a whitespace
查看 Python演示以下:
import re
rx = r"\s+(?=\d+/\d+/\d+\s)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.split(rx, s)
print(results)
或者,更复杂的正则表达式用于实际匹配这些日期:
Alternatively, a more complex regex can be used to actually match those dates:
\b\d+/\d+/\d+.*?(?=\s*\b\d+/\d+/\d+|$)
import re
rx = r"\b\d+/\d+/\d+.*?(?=\b\d+/\d+/\d+|$)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.findall(rx, s)
print(results)
这里,
-
\b\d + / \d + / \d +
- 匹配单词边界和类似日期的模式 -
。*?
- 任何0+字符,尽可能少到第一个位置跟随... -
(?= \s * \\ b \d + / \d + / \d + | $)
- 0+个空格和类似日期的模式或字符串的结尾($
)。
\b\d+/\d+/\d+
- matches a word boundary and a date-like pattern.*?
- any 0+ chars, as few as possible up to the first location that is followed with...(?=\s*\b\d+/\d+/\d+|$)
- 0+ whitespaces and a date-like pattern OR the end of string ($
).
这篇关于正则表达式到日期分割字段并保留的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文