BeautifulSoup中的多个条件:Text = True&IMG Alt =真实 [英] Multiple conditions in BeautifulSoup: Text=True & IMG Alt=True
问题描述
在BeautifulSoup中是否可以使用多个条件?
is there a way to use multiple conditions in BeautifulSoup?
这些是我想一起使用的两个条件:
These are the two conditions I like to use together:
获取文本:
soup.find_all(text=True)
获取img替代项:
soup.find_all('img', title=True):
我知道我可以单独进行操作,但是我希望将它们放在一起以保持HTML的流畅性.
I know I can do it separately but I would like to get it together to keep the flow of the HTML.
之所以这样做,是因为只有BeautifulSoup才通过css提取隐藏的文本:不显示.
The reason I'm doing this is because only BeautifulSoup extract the hidden text by css: Display None.
当使用driver.find_element_by_tag_name('body').text时,您将获得img alt att,但不幸的是,css不会显示隐藏文本:display:none.
When you use driver.find_element_by_tag_name('body').text you get the img alt att, but unfortunately not the hidden text by css: display:none.
感谢您的帮助.谢谢!
推荐答案
.find_all()
仅返回文本或标签,但是您可以使自己的函数从汤中返回文本,并从中返回文本. alt =
属性.
.find_all()
returns only texts or tags, but you can make your own function that returns texts from the soup and text from the alt=
attributes.
例如:
from bs4 import BeautifulSoup, Tag, NavigableString
txt = '''
Some text
<img alt="Some alt" src="#" />
Some other text
'''
def traverse(s):
for c in s.contents:
if isinstance(c, Tag):
if c.name == 'img' and 'alt' in c.attrs:
yield c['alt']
yield from traverse(c)
elif isinstance(c, NavigableString):
yield c
soup = BeautifulSoup(txt, 'html.parser')
for text in traverse(soup):
print(text.strip())
打印:
Some text
Some alt
Some other text
这篇关于BeautifulSoup中的多个条件:Text = True&IMG Alt =真实的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!