如何从python的动态下拉列表中提取/抓取选项值? [英] How to Extract/Scrape option values from dynamic dropdowns in python?

查看:110
本文介绍了如何从python的动态下拉列表中提取/抓取选项值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网页中提取数据,其中下拉列表中的选项是根据我们的输入动态加载的.我正在使用 Selenium Webdriver 从下拉列表中提取数据.请看下面的截图.

<块引用>

以下代码的输出.

def scrape_aqi_site_id():URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URLpayload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0='#从网络请求中获取的唯一有效载荷response = requests.post(URL,data=payload,verify=False) #POST请求使用URL和Payload信息获取数据result = json.loads(response.text) # 使用 json 库解析 JSON 对象提取状态=结果['站']for state in range(len(extracted_states)): # 循环提取状态及其站点数据.打印('=' * 120)print('为状态抓取站点数据:' + Extract_states[state]['stateID'])for station in range(len(extracted_states[state]['stationsInCity'])): # 循环遍历每个状态站数据以获取站信息打印('-' * 100)打印('抓取城市及其车站的数据:城市('+提取状态[状态]['stationsInCity'][站]['cityID'] +')&站('+提取状态[状态]['stationsInCity'][站]['名称'] + ')')打印('城市:' + 提取状态[状态]['stationsInCity'][station]['cityID'])打印('站名:'+extracted_states[state]['stationsInCity'][station]['name'])打印('站站点ID:'+extracted_states[state]['stationsInCity'][station]['id'])打印('-' * 100)print('Scraping of data for state : (' + Extracted_states[state]['stateID'] + ') 现在完成了另一个......')打印('=' * 120)scrape_aqi_site_id()

I am trying to extract the data from a Web page where the options in the dropdown lists are dynamically loaded based on our input. I am using Selenium Webdriver to extract the data from the dropdowns. Please see the screenshots below.

Dropdown 1 - State

Dropdown 2 - City

Dropdown 3 - Station

City Dropdown options are loaded once I select the state and Station dropdown is loaded after I select city.

So far I was able to get it working to extract the station names with this code.

citiesList = []
stationNameList = []
siteIdList = []

for city in cityOptions[1:]:
    citiesList.append(city.text)

stationDropDown = driver.find_element_by_xpath("//select[contains(@id,'stations')]")
stationOptions = stationDropDown.find_elements_by_tag_name('option')

 
      for ele in citiesList:
            cityDropdown.send_keys(ele, Keys.RETURN)
            time.sleep(2)
            stationDropDown.click()
            print(stationDropDown.text)

State Options

City Options

Option values from station dropdown

Can anyone please help me in extracting the siteId's for every state and city?

解决方案

Try below approach using python - requests simple, straightforward, reliable, fast and less code is required when it comes to requests. I have fetched the API URL from website itself after inspecting the network section of google chrome browser.

What exactly below script is doing:

  1. First it will take the API URL and payload (very important to do a POST request) to do a POST request and get the data in return.
  2. After getting the data script will parse the JSON data using json.loads library.
  3. At last it will iterate all over the list of stations one by one and print the details like State name, City name, Station name and Station Site Id.

Network call tab

Output of below code.

def scrape_aqi_site_id():
URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
result = json.loads(response.text) # parse the JSON object using json library
extracted_states = result['stations'] 
for state in range(len(extracted_states)): # loop over extracted states and its stations data.
    print('=' * 120)
    print('Scraping station data for state : ' + extracted_states[state]['stateID'])
    for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
        print('-' * 100)
        print('Scraping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
        print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
        print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
        print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
        print('-' * 100)        
    print('Scraping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
    print('=' * 120)

scrape_aqi_site_id()

这篇关于如何从python的动态下拉列表中提取/抓取选项值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆