在抓取前如何使用硒从一个网址选项卡转到另一个网址选项卡? [英] how to use selenium to go from one url tab to another before scraping?

查看:48
本文介绍了在抓取前如何使用硒从一个网址选项卡转到另一个网址选项卡?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了以下代码,希望通过几个参数打开一个新选项卡,然后抓取新选项卡上的数据表.

I have created the following code in hopes to open up a new tab with a few parameters and then scrape the data table that is on the new tab.

#Open Webpage
url = "https://www.website.com"
driver=webdriver.Chrome(executable_path=r"C:\mypathto\chromedriver.exe")
driver.get(url)

#Click Necessary Parameters
driver.find_element_by_partial_link_text('Output').click()
driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[3]').click()
driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[4]').click()
driver.find_element_by_xpath('//*[@id="repOpt"]/table[2]/tbody/tr/td[2]/input[4]').click()
time.sleep(2)

driver.find_element_by_partial_link_text('Dates').click()
driver.find_element_by_xpath('//*[@id="RangeOption"]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[1]/td[2]/select/option[2]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[1]/td[3]/select/option[1]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[1]/td[4]/select/option[1]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[2]/td[2]/select/option[2]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[2]/td[3]/select/option[31]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[2]/td[4]/select/option[1]').click()
time.sleep(2)

driver.find_element_by_partial_link_text('Groupings').click()
driver.find_element_by_xpath('//*[@id="availFld_DATE"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_LOCID"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_STATE"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_DDSO_SA"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_CLASS_ID"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_REGION"]/a/img').click()
time.sleep(2)

driver.find_element_by_partial_link_text('Run').click()
time.sleep(2)

df_url = driver.switch_to_window(driver.window_handles[0])
page = requests.get(df_url).text
soup = BeautifulSoup(page, features = 'html5lib')
soup.prettify()

但是,当我运行它时,会弹出以下错误消息.

However, the following error pops up when I run it.

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

我会说,无论使用什么参数,新选项卡始终生成相同的url.换句话说,如果新选项卡创建了www.website.com/b,则无论更改参数如何,它也会在第三,第四等时间创建www.website.com/b.有什么想法吗?

I will say that regardless of the parameters, the new tab always generates the same url. In other words, if the new tab creates www.website.com/b, it also creates www.website.com/b the third, fourth, etc. time, regardless of changing the parameters. Any thoughts?

推荐答案

问题出在这里:

df_url = driver.switch_to_window(driver.window_handles[0])
page = requests.get(df_url).text

df_url 未引用页面的网址.为此,您应该在切换窗口后调用 driver.current_url 以获得活动窗口的URL.

df_url is not referring to the url of the page. To get that, you should call driver.current_url after switching windows to get the url of the active window.

其他一些指针:

  • 通过xpath查找元素效率相对较低(
  • finding elements by xpath is relatively inefficient (source)
  • instead of time.sleep, you can look into using explicit waits

这篇关于在抓取前如何使用硒从一个网址选项卡转到另一个网址选项卡?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆