python BeautifulSoup在div的子代中获取所有href [英] python BeautifulSoup get all href in Children of div

查看:72
本文介绍了python BeautifulSoup在div的子代中获取所有href的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,我一直在尝试从此html代码获取链接和内部文本:

I am new to python and I've been trying to get links and inner text from this html code :

<div class="someclass">
  <ul class="listing">
        <li>
          <a href="http://link1.com" title="">title1</a>
                </li>
        <li>
           <a href="http://link2.com" title="">title2</a>
                 </li>
        <li>
           <a href="http://link3.com" title="">title3</a>
                 </li>
        <li>
           <a href="http://link4.com" title="">title4</a>
                  </li>
  </ul>
</div>

我只想要所有链接,它们来自href http://link.com和内部文本title

I want only and all links from href http://link.com and the inner text title

我尝试了此代码

    div = soup.find_all('ul',{'class':'listing'})
for li in div:
    all_li = li.find_all('li')
    for link in all_li.find_all('a'):
        print(link.get('href'))

但是没有运气可以帮助我

but no luck can someone help me

推荐答案

问题是您正在使用

The problem is that you are using find_all which returns a list in your second forloop where you should use find()

>>> for ul in soup.find_all('ul', class_='listing'):
...     for li in ul.find_all('li'):
...         a = li.find('a')
...         print(a['href'], a.get_text())
... 
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4

您还可以使用 CSS选择器代替嵌套的 forloop

You can also use a CSS selector instead of nested forloop

>>> for a in soup.select('.listing li a'):
...     print(a['href'], a.get_text(strip=True))
... 
http://link1.com title1
http://link2.com title2
http://link3.com title3
http://link4.com title4

这篇关于python BeautifulSoup在div的子代中获取所有href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆