BeautifulSoup-如何获取两个不同标签之间的所有文本? [英] BeautifulSoup - How to get all text between two different tags?

查看:649
本文介绍了BeautifulSoup-如何获取两个不同标签之间的所有文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取两个标签之间的所有文本:

I would like to get all text between two tags:

<div class="lead">I DONT WANT this</div>

#many different tags - p, table, h2 including text that I want

<div class="image">...</div>

我是这样开始的:

url = "http://......."
req = urllib.request.Request(url)
source = urllib.request.urlopen(req)
soup = BeautifulSoup(source, 'lxml')

start = soup.find('div', {'class': 'lead'})
end = soup.find('div', {'class': 'image'})

我不知道下一步该怎么做

And I have no idea what to do next

推荐答案

尝试以下代码,让解析器从类Lead开始,并在击中类图像并打印所有可用标签时退出程序,可以将其更改为打印整个代码:

Try this code, it let's the parser start at class lead and exits the programm when hitting class image and prints all available tags, this can be changed to printing entire code:

html = u""
for tag in soup.find("div", { "class" : "lead" }).next_siblings:
    if soup.find("div", { "class" : "image" }) == tag:
        break
    else:
        html += unicode(tag)
print html

这篇关于BeautifulSoup-如何获取两个不同标签之间的所有文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆