通过将正则表达式与元素匹配来拆分列表 [英] Splitting a list by matching a regex to an element

查看:59
本文介绍了通过将正则表达式与元素匹配来拆分列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表,其中包含一些特定元素.我想将该列表分为子列表"或基于这些元素的不同列表.例如:

I have a list that has some specific elements in it. I would like to split that list into 'sublists' or different lists based on those elements. For example:

test_list = ['a and b, 123','1','2','x','y','Foo and Bar, gibberish','123','321','June','July','August','Bonnie and Clyde, foobar','today','tomorrow','yesterday']

如果一个元素匹配某物和某物",我想分成几个子列表:

I would like to split into sublists if an element matches 'something and something':

new_list = [['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

到目前为止,如果在特定元素之后有固定数量的项目,则可以完成此操作.例如:

So far I can accomplish this if there is a fixed amount of items after the specific element. For example:

import re
element_regex = re.compile(r'[A-Z a-z]+ and [A-Z a-z]+')
new_list = [test_list[i:(i+4)] for i, x in enumerate(test_list) if element_regex.match(x)]

几乎在那,但是在感兴趣的特定元素之后并不一定总是有三个元素.有没有比遍历每个项目更好的方法了?

Which is almost there, but there's not always exactly three elements following the specific element of interest. Is there a better way than just looping over every single item?

推荐答案

如果您想要单线,

new_list = reduce(lambda a, b: a[:-1] + [ a[-1] + [ b ] ] if not element_regex.match(b) or not a[0] else a + [ [ b ] ], test_list, [ [] ])

可以.但是, python方式将使用更冗长的变体.

will do. The python way would however be to use a more verbose variant.

我在2.1 GHz的4核i7上进行了一些速度测量. timeit模块将此代码运行1.000.000次,并且需要11.38s.使用itertools模块中的groupby(另一个答案是Kasras变体)需要9.92s.最快的变体是我建议的详细版本,仅需5.66秒:

I did some speed measurements on a 4 core i7 @ 2.1 GHz. The timeit module ran this code 1.000.000 times and needed 11.38s for that. Using groupby from the itertools module (Kasras variant from the other answer) requires 9.92s. The fastest variant is the verbose version I suggested, taking only 5.66s:

new_list = [[]]
for i in test_list:
    if element_regex.match(i):
        new_list.append([])
    new_list[-1].append(i)

这篇关于通过将正则表达式与元素匹配来拆分列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆