使用BeautifulSoup检索具有多个相同类名的数据 [英] Scraping data with multiple same class name using BeautifulSoup

查看:756
本文介绍了使用BeautifulSoup检索具有多个相同类名的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个房地产网站练习抓取,并且我想抓取所有地址以进行最近的销售.例如,网站HTML的一部分如下所示: url = https://www.compass.com/agents/irene-vuong/

I'm practicing scraping using a real-estate website, and I want to scrap all addresses for recent sales. For example, the part of the website HTML looks like this: url = https://www.compass.com/agents/irene-vuong/

<div class="profile-active-listings" role="tabpanel" id="active-listings-sales">
    <div class="card-content">
      <a class="card-title" href="/listing" data-tn="label-address"> 111 East 35th </a>
                                            ........
<div class="textIntent-headline1"> Recent Sales</div>
    <div class="card-content">
      <a class="card-title" href="/morelisting" data-tn="label-address"> East 4th </a>

我正在尝试使用以下代码访问所有地址:

And I'm trying to get access to all address, using below code:

for i in range(0, 30):
    h = soup.findAll('a', {'class':'card-title'})[i]
    print(h)

但是,我得到一个错误:

However, I get an error of:

IndexError: list index out of range

我得到前几个地址,但仅在最近的销售"之前. 它只是在第一部分获得地址,而不是整个网站. 如何获取所有地址?

I get the first few addresses, but only right before "Recent Sales". It's only getting addresses on the first part but not the entire website. How do I get all addresses?

推荐答案

findAll方法返回符合搜索条件的所有元素的列表.

The findAll method returns a list of all elements that match your search criteria.

在您的情况下,它返回一个长度为2的列表.

In your case, it returns a list of length 2.

然后您将迭代0-29,并在length2列表中寻找这些索引.

you are then iterating through 0-29 and looking for those indexes on your list of length2.

出现错误.

您的代码应读得更像:

for x in soup.findAll('a', {'class':'card-title'}):
  print(x)

这篇关于使用BeautifulSoup检索具有多个相同类名的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆