BeautifulSoup中的多个条件:Text = True&IMG Alt =真实 [英] Multiple conditions in BeautifulSoup: Text=True & IMG Alt=True

查看:42
本文介绍了BeautifulSoup中的多个条件:Text = True&IMG Alt =真实的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在BeautifulSoup中是否可以使用多个条件?

is there a way to use multiple conditions in BeautifulSoup?

这些是我想一起使用的两个条件:

These are the two conditions I like to use together:

获取文本:

soup.find_all(text=True)

获取img替代项:

soup.find_all('img', title=True):

我知道我可以单独进行操作,但是我希望将它们放在一起以保持HTML的流畅性.

I know I can do it separately but I would like to get it together to keep the flow of the HTML.

之所以这样做,是因为只有BeautifulSoup才通过css提取隐藏的文本:不显示.

The reason I'm doing this is because only BeautifulSoup extract the hidden text by css: Display None.

当使用driver.find_element_by_tag_name('body').text时,您将获得img alt att,但不幸的是,css不会显示隐藏文本:display:none.

When you use driver.find_element_by_tag_name('body').text you get the img alt att, but unfortunately not the hidden text by css: display:none.

感谢您的帮助.谢谢!

推荐答案

.find_all()仅返回文本或标签,但是您可以使自己的函数从汤中返回文本,并从中返回文本. alt = 属性.

.find_all() returns only texts or tags, but you can make your own function that returns texts from the soup and text from the alt= attributes.

例如:

from bs4 import BeautifulSoup, Tag, NavigableString


txt = '''
Some text
<img alt="Some alt" src="#" />
Some other text
'''

def traverse(s):
    for c in s.contents:
        if isinstance(c, Tag):
            if c.name == 'img' and 'alt' in c.attrs:
                yield c['alt']
            yield from traverse(c)
        elif isinstance(c, NavigableString):
            yield c


soup = BeautifulSoup(txt, 'html.parser')

for text in traverse(soup):
    print(text.strip())

打印:

Some text
Some alt
Some other text

这篇关于BeautifulSoup中的多个条件:Text = True&amp;IMG Alt =真实的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆