美丽的汤嵌套标签搜索 [英] Beautiful Soup Nested Tag Search
问题描述
我正在尝试编写将对网页上的单词进行计数的python程序.我使用Beautiful Soup 4刮取了页面,但访问嵌套的HTML标记时遇到困难(例如:<div>
中的<p class="hello">
).
I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello">
inside <div>
).
每次我尝试使用page.findAll()
(页面是包含整个页面的Beautiful Soup对象)方法找到这样的标签时,它根本找不到任何标签,尽管有.有什么简单的方法或其他方法可以做到吗?
Every time I try finding such tag using page.findAll()
(page is Beautiful Soup object containing the whole page) method it simply doesn't find any, although there are. Is there any simple method or another way to do it?
推荐答案
也许我想您要尝试的工作是先查找特定的div标签,然后搜索其中的所有p标签并计算它们的数量或执行任何操作你要.例如:
Maybe I'm guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:
soup = bs4.BeautifulSoup(content, 'html.parser')
# This will get the div
div_container = soup.find('div', class_='some_class')
# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
# prints the p tag content
print(ptag.text)
希望有帮助
这篇关于美丽的汤嵌套标签搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!