如何在新选项卡中打开网站内的每个产品,以便通过 Python 使用 Selenium 进行抓取 [英] How to open each product within a website in a new tab for scraping using Selenium through Python
问题描述
我正在使用 selenium 抓取网站https://www.medline.com/catalog/category-products.jsp?itemId=Z05-CA02_03&N=111079+4294770643&iclp=Z05-CA02_03"
I am scraping a web site using selenium "https://www.medline.com/catalog/category-products.jsp?itemId=Z05-CA02_03&N=111079+4294770643&iclp=Z05-CA02_03"
对于单页和单个产品,我可以通过传递产品 url 来抓取,但我正在尝试通过 selenium 来实现,即自动选择产品页面一一选择所有产品后,它应该移动到下一页,打开产品详细信息页面后,它应该刮掉,这是由美丽的汤完成的这里是来自基本网址的产品网址https://www.medline.com/产品/SensiCare-无粉丁腈橡胶-检查手套/SensiCare/Z05-PF00342?question=&index=P1&indexCount=1"
For single page and single product i am able to scrape by passing the product url but i am trying to do so by selenium i.e auto selection of product an page after select all the product one by one and it should move to next page and after opening product details page it should scrape which is done by beautiful soup here is product url from the base url "https://www.medline.com/product/SensiCare-Powder-Free-Nitrile-Exam-Gloves/SensiCare/Z05-PF00342?question=&index=P1&indexCount=1"
这是我的代码:
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(executable_path='C:/Users/ptiwar34/Documents/chromedriver.exe', chrome_options=chromeOptions, desired_capabilities=chromeOptions.to_capabilities())
driver.get("https://www.medline.com/catalog/category-products.jsp?itemId=Z05-CA02_03&N=111079+4294770643&iclp=Z05-CA02_03")
while True:
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'resultGalleryViewRow')]//div[@class='medGridProdTitle']//a[contains(@href]"))).click()
print("Clicked for next page")
except TimeoutException:
print("No more pages")
break
driver.quit()
这里不会抛出错误
它没有为每个产品打开页面,我想在新标签中打开每个产品,抓取后删除并打开新产品的新标签
It does not open page for each product , I want to open each product in new tab and after scraping it delete and open the new tab for a new product
推荐答案
来自网页 https://www.medline.com/catalog/category-products.jsp?itemId=Z05-CA02_03&N=111079+4294770643&iclp=Z05-CA02_03
在 新标签 并抓取它,您必须为 WebDriverWait="https://stackoverflow.com/questions/50844779/how-to-handle-multiple-windows-in-python-selenium-with-firefox-driver/50859297#50859297">number_of_windows_to_be(2)代码>
,您可以使用以下定位器策略一个>:
From the webpage https://www.medline.com/catalog/category-products.jsp?itemId=Z05-CA02_03&N=111079+4294770643&iclp=Z05-CA02_03
to open each product in new tab and scrape it you have to induce WebDriverWait for the number_of_windows_to_be(2)
and you can use the following Locator Strategies:
代码块:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.medline.com/catalog/category-products.jsp?itemId=Z05-CA02_03&N=111079+4294770643&iclp=Z05-CA02_03")
my_hrefs = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'resultGalleryViewRow')]//div[@class='medGridProdTitle']//a")))]
windows_before = driver.current_window_handle # Store the parent_window_handle for future use
for my_href in my_hrefs:
driver.execute_script("window.open('" + my_href +"');")
WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce WebDriverWait for the number_of_windows_to_be 2
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
driver.switch_to.window(new_window) # switch_to the new window
time.sleep(3) # perform your webscraping here
print(driver.title) # print the page title or your perform your webscraping
driver.close() # close the window
driver.switch_to.window(windows_before) # switch_to the parent_window_handle
driver.quit() #quit your program
控制台输出:
Console Output:
SensiCare Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
MediGuard Vinyl Synthetic Exam Gloves | Medline Industries, Inc.
CURAD Stretch Vinyl Exam Gloves | Medline Industries, Inc.
CURAD Nitrile Exam Gloves | Medline Industries, Inc.
SensiCare Ice Blue Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
MediGuard Synthetic Exam Gloves | Medline Industries, Inc.
Accutouch Synthetic Exam Gloves | Medline Industries, Inc.
Aloetouch Ice Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
Aloetouch 3G Powder-Free Synthetic Exam Gloves | Medline Industries, Inc.
SensiCare Powder-Free Stretch Vinyl Sterile Exam Gloves | Medline Industries, Inc.
CURAD Powder-Free Textured Latex Exam Gloves | Medline Industries, Inc.
Accutouch Chemo Nitrile Exam Gloves | Medline Industries, Inc.
Aloetouch 12" Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
Ultra Stretch Synthetic Exam Gloves | Medline Industries, Inc.
Generation Pink 3G Synthetic Exam Gloves | Medline Industries, Inc.
SensiCare Extended Cuff Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
Eudermic MP High-Risk Powder-Free Latex Exam Gloves | Medline Industries, Inc.
Aloetouch Powder-Free Latex Exam Gloves | Medline Industries, Inc.
CURAD Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
Medline Sterile Powder-Free Latex Exam Gloves | Medline Industries, Inc.
SensiCare Silk Powder-Free Nitrile Exam Gloves | Medline Industries, Inc.
Medline Sterile Powder-Free Latex Exam Glove Pairs | Medline Industries, Inc.
MediGuard 2.0 Nitrile Exam Gloves | Medline Industries, Inc.
Designer Boxed Vinyl Exam Gloves | Medline Industries, Inc.
您可以在以下位置找到一些相关的详细讨论:
You can find a couple of relevant detailed discussions in:
- 如何在一个 webtable 中打开多个 hrefs 来抓取 selenium
- 在 Python 中使用 Selenium 进行网页抓取 JavaScript 呈现的内容
- 即使添加了 StaleElementReferenceException在使用网络抓取从维基百科收集数据时等待
这篇关于如何在新选项卡中打开网站内的每个产品,以便通过 Python 使用 Selenium 进行抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!