正则表达式到日期分割字段并保留 [英] Regex to splitstring on date and keep it

查看:540
本文介绍了正则表达式到日期分割字段并保留的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,我想在日期分割:

I have a string that I want to split on the date:

28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato

应该最终作为



which should end up as

 28/11/2016 Mushroom
 05/12/2016 Carrot
 12/12/2016 Broccoli
 19/12/2016 Potato

显然日期变化很难。我已经制定了正则表达式,但我不知道如何保持分隔符(日期)。

Obviously the date changes which makes it difficult. I've worked out the regex but I can't figure out how to keep the delimiter (the date) as well.

import re

s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"

replaced = re.sub(r"\d{2}\/\d{2}\/\d{4}\s*", ",", s) # looses data
print replaced

g = re.match(r"(\d{2}\/\d{2}\/\d{4}\s*)(.*)", s)

if g:
  # replaced = s.replace(group(0), "\n" + g.group(0)) # fails
  # print replaced 


推荐答案

如果总有空格,您可以使用分割方法在日期之间:

You may use a splitting approach if there is always whitespace between the dates:

\s+(?=\d+/\d+/\d+\s)

请参阅正则表达式演示

详细信息

Details:


  • \s + - 匹配1+空格

  • (?= \d + / \d + / \d + \s) - 后跟1位数字, / +一个或多个数字两次(类似日期的样式),然后是空格

  • \s+ - match 1+ whitespaces
  • (?=\d+/\d+/\d+\s) - that are followed with 1+ digits, and / + one or more digits twice (the date-like pattern), and then a whitespace

查看 Python演示以下:

import re
rx = r"\s+(?=\d+/\d+/\d+\s)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.split(rx, s)
print(results)

或者,更复杂的正则表达式用于实际匹配这些日期:

Alternatively, a more complex regex can be used to actually match those dates:

\b\d+/\d+/\d+.*?(?=\s*\b\d+/\d+/\d+|$)

请参阅正则表达式演示 Python演示

import re
rx = r"\b\d+/\d+/\d+.*?(?=\b\d+/\d+/\d+|$)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.findall(rx, s)
print(results)

这里,


  • \b\d + / \d + / \d + - 匹配单词边界和类似日期的模式

  • 。*? - 任何0+字符,尽可能少到第一个位置跟随...

  • (?= \s * \\ b \d + / \d + / \d + | $) - 0+个空格和类似日期的模式或字符串的结尾( $ )。

  • \b\d+/\d+/\d+ - matches a word boundary and a date-like pattern
  • .*? - any 0+ chars, as few as possible up to the first location that is followed with...
  • (?=\s*\b\d+/\d+/\d+|$) - 0+ whitespaces and a date-like pattern OR the end of string ($).

这篇关于正则表达式到日期分割字段并保留的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆