访问下一个同级< li>与BeautifulSoup的元素 [英] Access next sibling <li> element with BeautifulSoup
问题描述
我完全不熟悉使用Python/BeautifulSoup进行Web解析.我有一个具有(部分)代码的HTML,如下所示:
I am completely new to web parsing with Python/BeautifulSoup. I have an HTML that has (part of) the code as follows:
<div id="pages">
<ul>
<li class="active"><a href="example.com">Example</a></li>
<li><a href="example.com">Example</a></li>
<li><a href="example1.com">Example 1</a></li>
<li><a href="example2.com">Example 2</a></li>
</ul>
</div>
我必须访问每个链接(基本上每个<li>
元素),直到不再存在<li>
标签.每次单击链接时,其对应的<li>
元素都将类设为活动".我的代码是:
I have to visit each link (basically each <li>
element) until there are no more <li>
tags present. Each time a link is clicked, its corresponding <li>
element gets class as 'active'. My code is:
from bs4 import BeautifulSoup
import urllib2
import re
landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)
pageList = soup.find("div", {"id": "pages"})
page = pageList.find("li", {"class": "active"})
这段代码为我提供了列表中的第一个<li>
项.我的逻辑是我一直在检查next_sibling
是否不为None.如果不是None,那么我正在向同级<li>
中的<a>
标记的href
属性创建一个HTTP请求.那将使我进入下一页,依此类推,直到没有更多页面为止.
This code gives me the first <li>
item in the list. My logic is I am keeping on checking if the next_sibling
is not None. If it is not None, I am creating an HTTP request to the href
attribute of the <a>
tag in that sibling <li>
. That would get me to the next page, and so on, till there are no more pages.
但是我不知道如何获得上面给出的page
变量的next_sibling
.是page.next_sibling.get("href")
还是类似的东西?我浏览了文档,但是找不到.有人可以帮忙吗?
But I can't figure out how to get the next_sibling
of the page
variable given above. Is it page.next_sibling.get("href")
or something like that? I looked through the documentation, but somehow couldn't find it. Can someone help please?
推荐答案
Use find_next_sibling()
and be explicit about what sibling element do you want to find:
next_li_element = page.find_next_sibling("li")
如果page
对应于最后一个活动的li
,则
next_li_element
将变为None
:
next_li_element
would become None
if the page
corresponds to the last active li
:
if next_li_element is None:
# no more pages to go
这篇关于访问下一个同级< li>与BeautifulSoup的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!