如果网站有加载更多按钮以在页面上加载更多内容,如何抓取网站? [英] How to scrape website if it has load more button to load more content on the page?

查看:29
本文介绍了如果网站有加载更多按钮以在页面上加载更多内容,如何抓取网站?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path=r'C:\Users\gkhat\Downloads\chromedriver.exe')
driver.get('https://www.allrecipes.com/recipes/233/world-cuisine/asian/indian/')
card_titles = driver.find_elements_by_class_name('card__detailsContainer')
button = driver.find_element_by_id('category-page-list-related-load-more-button')
for card_title in card_titles:
    rname = card_title.find_element_by_class_name('card__title').text
    print(rname)

    time.sleep(3)
    driver.execute_script("arguments[0].scrollIntoView(true);", button)
    driver.execute_script("arguments[0].click();", button)
    time.sleep(3)

driver.quit()

网站点击后加载食物卡在加载更多"上按钮上面的代码抓取食谱标题我希望它在点击加载更多按钮后继续抓取标题.我尝试转到网络"选项卡并单击 XHR,但没有任何请求显示 JSON.我该怎么办?

The website loads the food cards after clicking on the the "Load More" button the above code scrape the recipe title I want it keep scraping the title even after clicking the load more button. I tried the going to the Network tab the clicking on XHR but none of the requests shows the JSON. What should I do?

推荐答案

我为此尝试了以下代码.它有效,但我不确定这是否是最好的方法.仅供参考,我手动处理了 email 的这些弹出窗口.你需要找到一种方法来处理它们.

I tried below code for that. It works, but I am not sure if this is the best way to do it. FYI I handled those pop-ups for email manually. You need to find a way to handle them.

from selenium import webdriver
import time
from selenium.common.exceptions import StaleElementReferenceException

driver = webdriver.Chrome(executable_path="path")
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://www.allrecipes.com/recipes/233/world-cuisine/asian/indian/")
receipes = driver.find_elements_by_class_name("card__detailsContainer")
for rec in receipes:
    name = rec.find_element_by_tag_name("h3").get_attribute("innerText")
    print(name)
loadmore = driver.find_element_by_id("category-page-list-related-load-more-button")
j = 0
try:
    while loadmore.is_displayed():
        loadmore.click()
        time.sleep(5)
        lrec = driver.find_elements_by_class_name("recipeCard__detailsContainer")
        newlist = lrec[j:]
        for rec in newlist:
            name = rec.find_element_by_tag_name("h3").get_attribute("innerText")
            print(name)
        j = len(lrec)+1
        time.sleep(5)
except StaleElementReferenceException:
    pass
driver.quit()

这篇关于如果网站有加载更多按钮以在页面上加载更多内容,如何抓取网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆