Python正则表达式交替 [英] Python regex alternation

查看:50
本文介绍了Python正则表达式交替的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以 "http://something"https://something. 的形式查找网页上的所有链接.我做了一个正则表达式它有效:

L = re.findall(r"http://[^/\"]+/|https://[^/\"]+/", site_str)

但是,有没有更短的写法?我重复 ://[^/\"]+/两次,可能没有任何需要.我尝试了各种东西,但它不起作用.我试过:

L = re.findall(r"http|https(://[^/\"]+/)", site_str)L = re.findall(r"(http|https)://[^/\"]+/", site_str)L = re.findall(r"(http|https)(://[^/\"]+/)", site_str)

很明显我在这里遗漏了一些东西,或者我只是不太了解 python 正则表达式.

解决方案

您正在使用捕获组,并且 .findall() 会在您使用它们时改变行为(它只会返回捕获组的内容).您的正则表达式可以简化,但如果您改用捕获组,您的版本将起作用:

L = re.findall(r"(?:http|https)://[^/\"]+/", site_str)

如果在表达式周围使用单引号,则不需要转义双引号,只需改变表达式中的s,那么s? 也可以:

L = re.findall(r'https?://[^/"]+/', site_str)

演示:

<预><代码>>>>进口重新>>>示例 = '''...http://someserver.com/"...https://anotherserver.com/with/path"...'''>>>re.findall(r'https?://[^/"]+/', 例子)['http://someserver.com/', 'https://anotherserver.com/']

I'm trying to find all links on a webpage in the form of "http://something" or https://something. I made a regex and it works:

L = re.findall(r"http://[^/\"]+/|https://[^/\"]+/", site_str)

But, is there a shorter way to write this? I'm repeating ://[^/\"]+/ twice, probably without any need. I tried various stuff, but it doesn't work. I tried:

L = re.findall(r"http|https(://[^/\"]+/)", site_str)
L = re.findall(r"(http|https)://[^/\"]+/", site_str)
L = re.findall(r"(http|https)(://[^/\"]+/)", site_str)

It's obvious I'm missing something here or I just don't understand python regexes enough.

解决方案

You are using capturing groups, and .findall() alters behaviour when you use those (it'll only return the contents of capturing groups). Your regex can be simplified, but your versions will work if you use non-capturing groups instead:

L = re.findall(r"(?:http|https)://[^/\"]+/", site_str)

You don't need to escape the double quote if you use single quotes around the expression, and you only need to vary the s in the expression, so s? would work too:

L = re.findall(r'https?://[^/"]+/', site_str)

Demo:

>>> import re
>>> example = '''
... "http://someserver.com/"
... "https://anotherserver.com/with/path"
... '''
>>> re.findall(r'https?://[^/"]+/', example)
['http://someserver.com/', 'https://anotherserver.com/']

这篇关于Python正则表达式交替的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆