如何在一个有美丽汤的div内选择一个div类? [英] How to select a class of div inside of a div with beautiful soup?

查看:32
本文介绍了如何在一个有美丽汤的div内选择一个div类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在div标签中有一堆div标签:

I have a bunch of div tags within div tags:

<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
</div>
<div class="bar">Don't want this either
</div>

因此,我正在使用python和漂亮的汤将东西分开.仅当将其包装在"foo"类div中时,才需要所有"bar"类.这是我的代码

So I'm using python and beautiful soup to separate stuff out. I need all the "bar" class only when it is wrapped inside of a "foo" class div. Here's my code

from bs4 import BeautifulSoup
soup = BeautifulSoup(open(r'C:\test.htm'))
tag = soup.div
for each_div in soup.findAll('div',{'class':'foo'}):
    print(tag["bar"]).encode("utf-8")

或者,我尝试过:

from bs4 import BeautifulSoup
soup = BeautifulSoup(open(r'C:\test.htm'))
for each_div in soup.findAll('div',{'class':'foo'}):
     print(each_div.findAll('div',{'class':'bar'})).encode("utf-8")

我做错了什么?如果我可以从选择中删除div类不需要的",我也会对简单的print(each_div)感到满意.

What am I doing wrong? I would be just as happy with just a simple print(each_div) if I could remove the div class "unwanted" from the selection.

推荐答案

您可以使用 find_all()使用 foo搜索每个< div> 元素作为属性,对于每个以 bar 为属性的用户,对于其中每个,请使用 find(),例如:

You can use find_all() to search every <div> elements with foo as attribute and for each one of them use find() for those with bar as attribute, like:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
    bar = foo.find('div', attrs={'class': 'bar'})
    print(bar.text)

运行方式:

python3 script.py htmlfile

结果是:

I want this


更新:假设可能存在多个具有 bar 属性的< div> 元素,则先前的脚本将不起作用.它只会找到第一个.但是您可以获取它们的后代并对其进行迭代,例如:


UPDATE: Assuming there could exists several <div> elements with bar attribute, previous script won't work. It will only find the first one. But you could get their descendants and iterate them, like:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
    foo_descendants = foo.descendants
    for d in foo_descendants:
        if d.name == 'div' and d.get('class', '') == ['bar']:
            print(d.text)

输入类似:

<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
     <div class="bar">Also want this</div>
</div>

它将产生:

I want this
Also want this

这篇关于如何在一个有美丽汤的div内选择一个div类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆