具有不相等实体的嵌套循环 [英] Nested For Loop with Unequal Entities

查看:71
本文介绍了具有不相等实体的嵌套循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取与以下网站结构相似的网站内容

I would like to scrape the contents of a website with a similar structure to

https://www.wellstar.org/locations/pages/default.aspx

使用提供的网站作为框架,我要提取位置的名称和与该位置关联的标题.我希望能够产生以下内容:

Using the provided website as a framework, I would like to extract the location's name and the heading associated with that location. I want to be able to produce the following:

WellStar医院

WellStar Hospitals

WELLSTAR亚特兰大医疗中心

WELLSTAR ATLANTA MEDICAL CENTER

WellStar医院

WellStar Hospitals

WELLSTAR亚特兰大医疗中心南部

WELLSTAR ATLANTA MEDICAL CENTER SOUTH

...

WellStar健康公园

WellStar Health Parks

人类健康公园

...

到目前为止,我已经尝试过嵌套的for循环:

Thus far I have attempted a nested for loop:

for type in soup.find_all("h3",class_="WebFont SpotBodyGreen"):
    for name in soup.find_all("div",class_="PurpleBackgroundHeading"):
        print(type.text, name.text)

由于每个名称都与每种类型配对,因此上述for loop返回重复项,而与网站上的显示方式无关.任何帮助,无论是以代码和/或推荐资源的形式用于处理此任务的任何帮助,将不胜感激.

The above for loop returns duplicates due to each name being paired with each type regardless of presentation on the website. Any help whether in the form of code and/or recommended resources for dealing with this task would be greatly appreciated.

推荐答案

您需要一种按名称对位置进行分组的方法.为此,我们将每个块分开,将标题和位置收集到字典中:

You need a way to group the locations by name. For this, we separate each block, get the title and locations collected into a dictionary:

from pprint import pprint

import requests
from bs4 import BeautifulSoup

url = "https://www.wellstar.org/locations/pages/default.aspx"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

d = {}
for row in soup.select(".WS_Content > .WS_LeftContent > table > tr"):
    title = row.h3.get_text(strip=True)

    d[title] = [item.get_text(strip=True) for item in row.select(".PurpleBackgroundHeading a")]

pprint(d)

打印(用pprint()漂亮打印):

{'WellStar Community Hospice': ['Tranquility at Cobb Hospital',
                                'Tranquility at Kennesaw Mountain'],
 'WellStar Health Parks': ['Acworth Health Park', 'East Cobb Health Park'],
 'WellStar Hospitals': ['WellStar Atlanta Medical Center',
                        'WellStar Atlanta Medical Center South',
                        'WellStar Cobb Hospital',
                        'WellStar Douglas Hospital',
                        'WellStar Kennestone Hospital',
                        'WellStar North Fulton Hospital',
                        'WellStar Paulding Hospital',
                        'WellStar Spalding Regional Hospital',
                        'WellStar Sylvan Grove Hospital',
                        'WellStar West Georgia Medical Center',
                        'WellStar Windy Hill Hospital'],
 'WellStar Urgent Care Centers': ['WellStar Urgent Care in Acworth',
                                  'WellStar Urgent Care in Kennesaw',
                                  'WellStar Urgent Care in Marietta - Delk '
                                  'Road',
                                  'WellStar Urgent Care in Marietta - East '
                                  'Cobb',
                                  'WellStar Urgent Care in Marietta - '
                                  'Kennestone',
                                  'WellStar Urgent Care in Marietta - Sandy '
                                  'Plains Road',
                                  'WellStar Urgent Care in Smyrna',
                                  'WellStar Urgent Care in Woodstock']}

这篇关于具有不相等实体的嵌套循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆