在Python中具有前瞻性断言的两个非常接近的正则表达式-为什么re.split()的行为不同? [英] Two very close regexes with lookahead assertions in Python - why does re.split() behave differently?

查看:123
本文介绍了在Python中具有前瞻性断言的两个非常接近的正则表达式-为什么re.split()的行为不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图回答这个问题,其中OP具有以下字符串:

I was trying to anser this question where the OP has the following string:

"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"

,并希望将其拆分以获取以下列表:

and wants to split it to obtain the following list:

['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

我试图通过在正则表达式(?=path:)中使用简单的先行断言来解决此问题.好吧,它不起作用:

I tried to solve it by using a simple lookahead assertion in a regex, (?=path:). Well, it did not work:

>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']

但是,在此答案中,回答者得到了它通过在先行断言之前加一个空格来工作:

However, in this answer, the answerer got it working by preceding the lookahead assertion with a whitespace:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

为什么正则表达式与空白一起使用?为什么没有空格怎么办?

Why did the regex work with the whitespace? Why did it not work without the whitespace?

推荐答案

Python的re.split()具有已记录的限制:在长度为零的匹配项中无法拆分.因此,拆分仅在增加空间的情况下起作用.

Python's re.split() has a documented limitation: It can't split on zero-length matches. Therefore the split only worked with the added space.

这篇关于在Python中具有前瞻性断言的两个非常接近的正则表达式-为什么re.split()的行为不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆