美丽的汤嵌套标签搜索 [英] Beautiful Soup Nested Tag Search

查看:116
本文介绍了美丽的汤嵌套标签搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写将对网页上的单词进行计数的python程序.我使用Beautiful Soup 4刮取了页面,但访问嵌套的HTML标记时遇到困难(例如:<div>中的<p class="hello">).

I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello"> inside <div>).

每次我尝试使用page.findAll()(页面是包含整个页面的Beautiful Soup对象)方法找到这样的标签时,它根本找不到任何标签,尽管有.有什么简单的方法或其他方法可以做到吗?

Every time I try finding such tag using page.findAll() (page is Beautiful Soup object containing the whole page) method it simply doesn't find any, although there are. Is there any simple method or another way to do it?

推荐答案

也许我想您要尝试的工作是先查找特定的div标签,然后搜索其中的所有p标签并计算它们的数量或执行任何操作你要.例如:

Maybe I'm guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:

soup = bs4.BeautifulSoup(content, 'html.parser') 

# This will get the div
div_container = soup.find('div', class_='some_class')  

# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
    # prints the p tag content
    print(ptag.text)

希望有帮助

这篇关于美丽的汤嵌套标签搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆