Python-根据后一个字符串的最后一次出现在两个字符串之间找到一个子字符串 [英] Python - find a substring between two strings based on the last occurence of the later string

查看:40
本文介绍了Python-根据后一个字符串的最后一次出现在两个字符串之间找到一个子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到介于字符串之间的子字符串.第一个字符串是< br> ,最后一个字符串是< br>< br> .我寻找的第一个字符串是重复的,而后面的字符串可以用作锚点.

I am trying to find a substring which is between to strings. The first string is <br> and the last string is <br><br>. The first string I look for is repetitive, while the later string can serve as an anchor.

这里是一个例子:

<div class="linkTabBl" style="float:left;padding-top:6px;width:240px">
    Anglo American plc
    <br>
    20 Carlton                 House Terrace
    <br>
    SW1Y 5AN London
    <br>
    United Kingdom
    <br><br>
    Phone : +44 (0)20 7968 8888
    <br>
    Fax : +44 (0)20 7968 8500
    <br>
    Internet : 
    <a class="pageprofil_link_blue" href="http://www.angloamerican.com" target="_blank">
        http://www.angloamerican.com
    </a>
    <br>
</div>

我正试图获得英国".我很想通过字符串操作来获取此字符串,但是如果有人可以使用Beautifulsoup(最好使用css_selector)来获取它,那也会很有趣.

I am trying to get "United Kingdom". I would love to get this string with string manipulation but as well would be intesrted if anyone can get it with Beautifulsoup (ideally using css_selector).

祝一切顺利.

网页

推荐答案

import re

html = """<div class="linkTabBl" style="float:left;padding-top:6px;width:240px">
    Anglo American plc
    <br>
    20 Carlton                 House Terrace
    <br>
    SW1Y 5AN London
    <br>
    United Kingdom
    <br><br>
    Phone : +44 (0)20 7968 8888
    <br>
    Fax : +44 (0)20 7968 8500
    <br>
    Internet : 
    <a class="pageprofil_link_blue" href="http://www.angloamerican.com" target="_blank">
        http://www.angloamerican.com
    </a>
    <br>
</div>"""

res = re.findall(r'<br>\n    ([a-zA-Z\s]+)?\n    <br><br>', html)

print(res)

注意: "\ n" 是换行符,从<'br'>到要查找的<'br'>有4个空格再次.因此,如果您有这样的事情:

Note: "\n " is a new line and 4 spaces from <'br'> to what you are looking for to <'br'> again. So if you have something like this:

...
<br>United Kingdom<br><br>
...

您应该替换

res = re.findall(r'br> \ n([a-zA-Z \ s] +)?\ n< br> br'',html)

作者

res = re.findall(r'br>([[a-zA-Z \ s] +)?< br>'br,html)

此处提供良好的正则表达式课程 https://regexone.com/

Good regex lessons here https://regexone.com/

这篇关于Python-根据后一个字符串的最后一次出现在两个字符串之间找到一个子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆