访问下一个同级< li>与BeautifulSoup的元素 [英] Access next sibling <li> element with BeautifulSoup

查看:127
本文介绍了访问下一个同级< li>与BeautifulSoup的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我完全不熟悉使用Python/BeautifulSoup进行Web解析.我有一个具有(部分)代码的HTML,如下所示:

I am completely new to web parsing with Python/BeautifulSoup. I have an HTML that has (part of) the code as follows:

<div id="pages">
    <ul>
        <li class="active"><a href="example.com">Example</a></li>
        <li><a href="example.com">Example</a></li>
        <li><a href="example1.com">Example 1</a></li>
        <li><a href="example2.com">Example 2</a></li>
    </ul>
</div>

我必须访问每个链接(基本上每个<li>元素),直到不再存在<li>标签.每次单击链接时,其对应的<li>元素都将类设为活动".我的代码是:

I have to visit each link (basically each <li> element) until there are no more <li> tags present. Each time a link is clicked, its corresponding <li> element gets class as 'active'. My code is:

from bs4 import BeautifulSoup
import urllib2
import re

landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)

pageList = soup.find("div", {"id": "pages"})

page = pageList.find("li", {"class": "active"})

这段代码为我提供了列表中的第一个<li>项.我的逻辑是我一直在检查next_sibling是否不为None.如果不是None,那么我正在向同级<li>中的<a>标记的href属性创建一个HTTP请求.那将使我进入下一页,依此类推,直到没有更多页面为止.

This code gives me the first <li> item in the list. My logic is I am keeping on checking if the next_sibling is not None. If it is not None, I am creating an HTTP request to the href attribute of the <a> tag in that sibling <li>. That would get me to the next page, and so on, till there are no more pages.

但是我不知道如何获得上面给出的page变量的next_sibling.是page.next_sibling.get("href")还是类似的东西?我浏览了文档,但是找不到.有人可以帮忙吗?

But I can't figure out how to get the next_sibling of the page variable given above. Is it page.next_sibling.get("href") or something like that? I looked through the documentation, but somehow couldn't find it. Can someone help please?

推荐答案

使用

Use find_next_sibling() and be explicit about what sibling element do you want to find:

next_li_element = page.find_next_sibling("li")

如果page对应于最后一个活动的li,则

next_li_element将变为None:

next_li_element would become None if the page corresponds to the last active li:

if next_li_element is None:
    # no more pages to go

这篇关于访问下一个同级&lt; li&gt;与BeautifulSoup的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆