Python正则表达式拆分没有空字符串 [英] Python regex split without empty string

查看:44
本文介绍了Python正则表达式拆分没有空字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下文件名表现出这种模式:

000014_L_20111007T084734-20111008T023142.txt000014_U_20111007T084734-20111008T023142.txt...

我想提取第二个下划线'_'之后和'.txt'之前的中间两个时间戳部分.所以我使用了以下 Python 正则表达式字符串拆分:

time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)

但这在返回的列表中给了我两个额外的空字符串:

time_info=['', '20111007T084734', '20111008T023142', '']

如何只获取两个时间戳信息?即我想要:

time_info=['20111007T084734', '20111008T023142']

解决方案

不要使用re.split(),使用正则表达式的groups()方法Match/SRE_Match 对象.

<预><代码>>>>f = '000014_L_20111007T084734-20111008T023142.txt'>>>time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()>>>时间信息('20111007T084734', '20111008T023142')

您甚至可以命名捕获组并在字典中检索它们,尽管为此使用 groupdict() 而不是 groups().(这种情况的正则表达式模式类似于 r'[LU]_(?P\w+)-(?P\w+)\.')>

I have the following file names that exhibit this pattern:

000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...

I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:

time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)

But this gives me two extra empty strings in the returned list:

time_info=['', '20111007T084734', '20111008T023142', '']

How do I get only the two time stamp information? i.e. I want:

time_info=['20111007T084734', '20111008T023142']

解决方案

Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.

>>> f = '000014_L_20111007T084734-20111008T023142.txt'
>>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
>>> time_info
('20111007T084734', '20111008T023142')

You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')

这篇关于Python正则表达式拆分没有空字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆