Beautiful Soup 嵌套标签搜索 [英] Beautiful Soup Nested Tag Search

查看:28
本文介绍了Beautiful Soup 嵌套标签搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个 Python 程序来计算网页上的字数.我使用 Beautiful Soup 4 来抓取页面,但是我在访问嵌套的 HTML 标签时遇到困难(例如:<p class="hello"><div>).

I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello"> inside <div>).

每次我尝试使用 page.findAll()(页面是包含整个页面的 Beautiful Soup 对象)方法查找这样的标签时,它根本找不到任何标签,尽管有.有什么简单的方法或者其他的方法吗?

Every time I try finding such tag using page.findAll() (page is Beautiful Soup object containing the whole page) method it simply doesn't find any, although there are. Is there any simple method or another way to do it?

推荐答案

也许我猜你想要做的是首先查看特定的 div 标签并搜索其中的所有 p 标签并计算它们或做任何事情你要.例如:

Maybe I'm guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:

soup = bs4.BeautifulSoup(content, 'html.parser') 

# This will get the div
div_container = soup.find('div', class_='some_class')  

# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
    # prints the p tag content
    print(ptag.text)

希望有帮助

这篇关于Beautiful Soup 嵌套标签搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆