单击加载更多后获取产品信息 [英] Get information for products after clicking load more
问题描述
我编写了以下代码,以便从显示某些产品的网页上获取信息,然后按加载更多"按钮,将显示更多产品.在运行下面的代码时,我仅获得前几个产品的信息.我认为代码是正确的,在我无法捕获的某个地方有一个小错误.如果有人可以帮助我解决这个问题,那就太好了.谢谢!
I have written the following code to get me information from a webpage that displays some products, and then on clciking 'load more', more products are displayed. On running the code below, I only get information for the first few products. I think the code is correct, there is a small error somewhere that I am not able to catch. Would be great if someone could help me resolve this. Thanks!
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import requests
import xlsxwriter
driver = webdriver.Chrome(executable_path=r"C:\Users\Home\Desktop\chromedriver.exe")
driver.get("https://justnebulizers.com/collections/nebulizer-accessories")
soup = BeautifulSoup(driver.page_source, 'html.parser')
time.sleep(4)
button= driver.find_element_by_xpath("//a[@class='load-more__btn action_button continue-button']")
button.click()
time.sleep(1)
soup = BeautifulSoup(driver.page_source, 'html.parser')
def cpap_spider(url):
source_code= requests.get(url)
plain_text= source_code.text
soup= BeautifulSoup(plain_text, 'html.parser')
for link in soup.findAll("a", {"class":"product-info__caption"}):
href="https://www.justnebulizers.com"+link.get("href")
#title= link.string
each_item(href)
print(href)
#print(title)
def each_item(item_url):
global cols_names, row_i
source_code= requests.get(item_url)
plain_text= source_code.text
soup= BeautifulSoup(plain_text, 'html.parser')
table=soup.find("table", {"class":"tab_table"})
if table:
table_rows = table.find_all('tr')
else:
row_i+=1
return
for row in table_rows:
cols = row.find_all('td')
for ele in range(0,len(cols)):
temp = cols[ele].text.strip()
if temp:
# Here if you want then you can remove unwanted characters like : ? from temp
# For example "Actual Weight" and ""
if temp[-1:] == ":":
temp = temp[:-1]
# Name of column
if ele == 0:
try:
cols_names_i = cols_names.index(temp)
except:
cols_names.append(temp)
cols_names_i = len(cols_names) - 1
worksheet.write(0, cols_names_i + 1, temp)
continue;
worksheet.write(row_i, cols_names_i + 1, temp)
row_i += 1
cols_names=[]
cols_names_i = 0
row_i = 1
workbook = xlsxwriter.Workbook('respiratory_care.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(0, 0, "href")
cpap_spider("https://justnebulizers.com/collections/nebulizer-accessories")
#each_item("https://www.1800cpap.com/viva-nasal-cpap-mask-by-3b-medical")
workbook.close()
推荐答案
您必须单击该按钮并向下滚动.所以我用:
You have to click that button and scroll down. So I used:
while True:
try:
driver.find_element_by_xpath("//a[@class='load-more__btn action_button continue-button']").click()
print('button found')
time.sleep(2)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
print('scrolled down')
except:
print('button not found')
break
我修复了您的代码中的一些问题.
And I fixed some issues in your code.
此代码将加载所有产品:
This code will load all of the products:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import requests
import xlsxwriter
driver = webdriver.Chrome(executable_path="chromedriver.exe")
def cpap_spider(url):
driver.get(url)
while True:
try:
driver.find_element_by_xpath("//a[@class='load-more__btn action_button continue-button']").click()
print('button found')
time.sleep(2)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
print('scrolled down')
except:
print('button not found')
break
''' As a list
elems = driver.find_elements_by_class_name("product-info__caption")
links = [elem.get_attribute('href') for elem in elems]
print(links)'''
for link in driver.find_elements_by_class_name("product-info__caption"):
href="https://www.justnebulizers.com"+link.get_attribute("href")
#title= link.string
#each_item(href)
print(href)
#print(title)
def each_item(item_url):
global cols_names, row_i
source_code= requests.get(item_url)
plain_text= source_code.text
soup= BeautifulSoup(plain_text, 'html.parser')
table=soup.find("table", {"class":"tab_table"})
if table:
table_rows = table.find_all('tr')
else:
row_i+=1
return
for row in table_rows:
cols = row.find_all('td')
for ele in range(0,len(cols)):
temp = cols[ele].text.strip()
if temp:
# Here if you want then you can remove unwanted characters like : ? from temp
# For example "Actual Weight" and ""
if temp[-1:] == ":":
temp = temp[:-1]
# Name of column
if ele == 0:
try:
cols_names_i = cols_names.index(temp)
except:
cols_names.append(temp)
cols_names_i = len(cols_names) - 1
worksheet.write(0, cols_names_i + 1, temp)
continue;
worksheet.write(row_i, cols_names_i + 1, temp)
row_i += 1
cols_names=[]
cols_names_i = 0
row_i = 1
workbook = xlsxwriter.Workbook('respiratory_care.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(0, 0, "href")
cpap_spider("https://justnebulizers.com/collections/nebulizer-accessories")
#each_item("https://www.1800cpap.com/viva-nasal-cpap-mask-by-3b-medical")
workbook.close()
这篇关于单击加载更多后获取产品信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!