访问下一个兄弟 <li>带有 BeautifulSoup 的元素 [英] Access next sibling <li> element with BeautifulSoup

查看：23 发布时间：2021/12/23 20:43:26 python html beautifulsoup

本文介绍了访问下一个兄弟 <li>带有 BeautifulSoup 的元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对使用 Python/BeautifulSoup 进行网络解析完全陌生.我有一个包含(部分)代码的 HTML，如下所示:


<ul><li class="active"><a href="example.com">示例</a></li><li><a href="example.com">示例</a></li><li><a href="example1.com">示例 1</a></li><li><a href="example2.com">示例 2</a></li>

我必须访问每个链接(基本上是每个

元素)，直到没有更多的

标签出现.每次点击链接时，其对应的

元素都会将 class 设为active".我的代码是:

from bs4 import BeautifulSoup导入 urllib2进口重新landPage = urllib2.urlopen('somepage.com').read()汤 = BeautifulSoup(登陆页面)pageList = soup.find("div", {"id": "pages"})page = pageList.find("li", {"class": "active"})

此代码为我提供了列表中的第一个

项目.我的逻辑是我一直在检查 next_sibling 是否不是 None.如果它不是 None，我正在创建一个 HTTP 请求到 <a> 标签的 href 属性在该兄弟

>.这将使我进入下一页，依此类推，直到没有更多页面.

但我不知道如何获得上面给出的 page 变量的 next_sibling.是 page.next_sibling.get("href") 还是类似的东西?我查看了文档，但不知何故找不到它.有人可以帮忙吗?

解决方案

使用 find_next_sibling() 并明确说明您要查找哪个兄弟元素:

next_li_element = page.find_next_sibling("li")

如果 page 对应于最后一个活动的 li，

next_li_element 将变为 None:

如果 next_li_element 是 None:# 没有更多的页面要走

I am completely new to web parsing with Python/BeautifulSoup. I have an HTML that has (part of) the code as follows:

<div id="pages">
    <ul>
        <li class="active"><a href="example.com">Example</a></li>
        <li><a href="example.com">Example</a></li>
        <li><a href="example1.com">Example 1</a></li>
        <li><a href="example2.com">Example 2</a></li>
    </ul>
</div>

I have to visit each link (basically each <li> element) until there are no more <li> tags present. Each time a link is clicked, its corresponding <li> element gets class as 'active'. My code is:

from bs4 import BeautifulSoup
import urllib2
import re

landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)

pageList = soup.find("div", {"id": "pages"})

page = pageList.find("li", {"class": "active"})

This code gives me the first <li> item in the list. My logic is I am keeping on checking if the next_sibling is not None. If it is not None, I am creating an HTTP request to the href attribute of the <a> tag in that sibling <li>. That would get me to the next page, and so on, till there are no more pages.

But I can't figure out how to get the next_sibling of the page variable given above. Is it page.next_sibling.get("href") or something like that? I looked through the documentation, but somehow couldn't find it. Can someone help please?

解决方案

Use find_next_sibling() and be explicit about what sibling element do you want to find:

next_li_element = page.find_next_sibling("li")

next_li_element would become None if the page corresponds to the last active li:

if next_li_element is None:
    # no more pages to go

这篇关于访问下一个兄弟 <li>带有 BeautifulSoup 的元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

访问下一个兄弟 <li>带有 BeautifulSoup 的元素 [英] Access next sibling <li> element with BeautifulSoup

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

访问下一个兄弟 <li>带有 BeautifulSoup 的元素 [英] Access next sibling &lt;li&gt; element with BeautifulSoup

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

访问下一个兄弟 <li>带有 BeautifulSoup 的元素 [英] Access next sibling <li> element with BeautifulSoup

登录关闭