python BeautifulSoup在div的子代中获取所有href [英] python BeautifulSoup get all href in Children of div
本文介绍了python BeautifulSoup在div的子代中获取所有href的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是python的新手,我一直在尝试从此html代码获取链接和内部文本:
I am new to python and I've been trying to get links and inner text from this html code :
<div class="someclass">
<ul class="listing">
<li>
<a href="http://link1.com" title="">title1</a>
</li>
<li>
<a href="http://link2.com" title="">title2</a>
</li>
<li>
<a href="http://link3.com" title="">title3</a>
</li>
<li>
<a href="http://link4.com" title="">title4</a>
</li>
</ul>
</div>
我只想要所有链接,它们来自href http://link.com
和内部文本title
I want only and all links from href http://link.com
and the inner text title
我尝试了此代码
div = soup.find_all('ul',{'class':'listing'})
for li in div:
all_li = li.find_all('li')
for link in all_li.find_all('a'):
print(link.get('href'))
但是没有运气可以帮助我
but no luck can someone help me
推荐答案
问题是您正在使用 find()
The problem is that you are using find_all
which returns a list in your second forloop where you should use find()
>>> for ul in soup.find_all('ul', class_='listing'):
... for li in ul.find_all('li'):
... a = li.find('a')
... print(a['href'], a.get_text())
...
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4
您还可以使用 CSS选择器代替嵌套的 forloop
You can also use a CSS selector instead of nested forloop
>>> for a in soup.select('.listing li a'):
... print(a['href'], a.get_text(strip=True))
...
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4
这篇关于python BeautifulSoup在div的子代中获取所有href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文