无法使用Selenium Python以表格形式获取数据 [英] Can't get data in table form using Selenium Python

查看:186
本文介绍了无法使用Selenium Python以表格形式获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Selenium python进行剪贴是新手.这样我可以检索一些数据,但是我希望它以网页上显示的表格形式出现:

Am new to scrapping using selenium python. So i could retrieve some of the data, but i want it in table form as is displayed on the web page:

这是我到目前为止所拥有的:

Here is what i have so far:

url='https://definitivehc.maps.arcgis.com/home/item.html?id=1044bb19da8d4dbfb6a96eb1b4ebf629&view=list&showFilters=false#data'

browser = webdriver.Chrome(r"C:\task\chromedriver")
browser.get(url)
time.sleep(25)


rows_in_table = browser.find_elements_by_xpath('//table[@class="dgrid-row-table"]//tr[th or td]')
for element in rows_in_table:
    print(element.text.replace('\n', ''))

结果摘要:

Hospital NameHospital TypeCityState AbrvZip CodeCounty NameState Name
Phoenix VA Health Care System (AKA Carl T Hayden VA Medical Center)VA HospitalPhoenixAZ85012MaricopaArizona040130401362620000.001
Southern Arizona VA Health Care SystemVA HospitalTucsonAZ85723PimaArizona04019040192952952202.002
VA Central California Health Care SystemVA HospitalFresnoCA93703FresnoCalifornia060190601954542202.003
VA Connecticut Healthcare System - West Haven Campus (AKA West Haven VA Medical Center)VA HospitalWest HavenCT6516New HavenConnecticut09009090092162161102.004

我真的很感谢专家对此提供帮助.谢谢.

I will really appreciate a help form an expert on this. Thanks.

推荐答案

这是@Andrej回答的更新版本,此代码将下载表格,而不是打印,将其另存为excel文档.

This is an updated version to what @Andrej answered, this code will download the table and instead of printing, saves it as an excel document.

import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

config_url = 'https://definitivehc.maps.arcgis.com/sharing/rest/portals/self?culture=en-us&f=json'
page_url = 'https://services7.arcgis.com/{_id}/arcgis/rest/services/Definitive_Healthcare_USA_Hospital_Beds/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=OBJECTID%20ASC&resultOffset={offset}&resultRecordCount=50&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D'

_id = requests.get(config_url).json()['id']
required=[]
offset = 0
while True:
    data = requests.get(page_url.format(_id=_id, offset=offset)).json()

    # uncommnet this to print all data:
    #pprint(json.dumps(data, indent=4))

    for i, f in enumerate(data['features'], offset+1):
        required.append(f['attributes'])


    if i % 50:
        break

    offset += 50

df=json_normalize(required)
with pd.ExcelWriter('dataFunction.xlsx', mode='A') as writer:
    df.to_excel(writer)

我尝试了此操作,并上传了excel工作表这里(链接到EXCEL)片)

I tried this and uploaded the excel sheet HERE(LINK TO EXCEL SHEET)!

这篇关于无法使用Selenium Python以表格形式获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆