使用BeautifulSoup检索具有多个相同类名的数据 [英] Scraping data with multiple same class name using BeautifulSoup
问题描述
我正在使用一个房地产网站练习抓取,并且我想抓取所有地址以进行最近的销售.例如,网站HTML的一部分如下所示: url = https://www.compass.com/agents/irene-vuong/
I'm practicing scraping using a real-estate website, and I want to scrap all addresses for recent sales. For example, the part of the website HTML looks like this: url = https://www.compass.com/agents/irene-vuong/
<div class="profile-active-listings" role="tabpanel" id="active-listings-sales">
<div class="card-content">
<a class="card-title" href="/listing" data-tn="label-address"> 111 East 35th </a>
........
<div class="textIntent-headline1"> Recent Sales</div>
<div class="card-content">
<a class="card-title" href="/morelisting" data-tn="label-address"> East 4th </a>
我正在尝试使用以下代码访问所有地址:
And I'm trying to get access to all address, using below code:
for i in range(0, 30):
h = soup.findAll('a', {'class':'card-title'})[i]
print(h)
但是,我得到一个错误:
However, I get an error of:
IndexError: list index out of range
我得到前几个地址,但仅在最近的销售"之前. 它只是在第一部分获得地址,而不是整个网站. 如何获取所有地址?
I get the first few addresses, but only right before "Recent Sales". It's only getting addresses on the first part but not the entire website. How do I get all addresses?
推荐答案
findAll方法返回符合搜索条件的所有元素的列表.
The findAll method returns a list of all elements that match your search criteria.
在您的情况下,它返回一个长度为2的列表.
In your case, it returns a list of length 2.
然后您将迭代0-29,并在length2列表中寻找这些索引.
you are then iterating through 0-29 and looking for those indexes on your list of length2.
出现错误.
您的代码应读得更像:
for x in soup.findAll('a', {'class':'card-title'}):
print(x)
这篇关于使用BeautifulSoup检索具有多个相同类名的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!