访问下一个兄弟 <li>带有 BeautifulSoup 的元素 [英] Access next sibling <li> element with BeautifulSoup
问题描述
我对使用 Python/BeautifulSoup 进行网络解析完全陌生.我有一个包含(部分)代码的 HTML,如下所示:
<ul><li class="active"><a href="example.com">示例</a></li><li><a href="example.com">示例</a></li><li><a href="example1.com">示例 1</a></li><li><a href="example2.com">示例 2</a></li>
我必须访问每个链接(基本上是每个元素),直到没有更多的
标签出现.每次点击链接时,其对应的
元素都会将 class 设为active".我的代码是:
from bs4 import BeautifulSoup导入 urllib2进口重新landPage = urllib2.urlopen('somepage.com').read()汤 = BeautifulSoup(登陆页面)pageList = soup.find("div", {"id": "pages"})page = pageList.find("li", {"class": "active"})
此代码为我提供了列表中的第一个 项目.我的逻辑是我一直在检查
next_sibling
是否不是 None.如果它不是 None,我正在创建一个 HTTP 请求到 <a>
标签的 href
属性在该兄弟 >.这将使我进入下一页,依此类推,直到没有更多页面.
但我不知道如何获得上面给出的 page
变量的 next_sibling
.是 page.next_sibling.get("href")
还是类似的东西?我查看了文档,但不知何故找不到它.有人可以帮忙吗?
使用 find_next_sibling()
并明确说明您要查找哪个兄弟元素:
next_li_element = page.find_next_sibling("li")
如果 page
对应于最后一个活动的 li
,
next_li_element
将变为 None
:
如果 next_li_element 是 None:# 没有更多的页面要走
I am completely new to web parsing with Python/BeautifulSoup. I have an HTML that has (part of) the code as follows:
<div id="pages">
<ul>
<li class="active"><a href="example.com">Example</a></li>
<li><a href="example.com">Example</a></li>
<li><a href="example1.com">Example 1</a></li>
<li><a href="example2.com">Example 2</a></li>
</ul>
</div>
I have to visit each link (basically each <li>
element) until there are no more <li>
tags present. Each time a link is clicked, its corresponding <li>
element gets class as 'active'. My code is:
from bs4 import BeautifulSoup
import urllib2
import re
landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)
pageList = soup.find("div", {"id": "pages"})
page = pageList.find("li", {"class": "active"})
This code gives me the first <li>
item in the list. My logic is I am keeping on checking if the next_sibling
is not None. If it is not None, I am creating an HTTP request to the href
attribute of the <a>
tag in that sibling <li>
. That would get me to the next page, and so on, till there are no more pages.
But I can't figure out how to get the next_sibling
of the page
variable given above. Is it page.next_sibling.get("href")
or something like that? I looked through the documentation, but somehow couldn't find it. Can someone help please?
Use find_next_sibling()
and be explicit about what sibling element do you want to find:
next_li_element = page.find_next_sibling("li")
next_li_element
would become None
if the page
corresponds to the last active li
:
if next_li_element is None:
# no more pages to go
这篇关于访问下一个兄弟 <li>带有 BeautifulSoup 的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!