我的工作仅抓取最后一页，而不是全部抓取 [英] My job scrapes only the last page instead of all of them

查看：78 发布时间：2020/7/28 3:37:40 python python-3.x csv selenium selenium-webdriver

本文介绍了我的工作仅抓取最后一页，而不是全部抓取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的抓取作业似乎只将CSV写入了网站的最后一页.我认为这是因为它循环遍历所有页面，然后写入到csv.它确实会刮擦元素并将其打印在控制台中.您是否必须循环浏览并立即为每个页面写入csv，因为它无法存储数据?我曾尝试调整代码以适应这种情况，但似乎无法使其正常工作.

My scraping job only seems to write to CSV the last page of the website. I assume this is because it is looping through all pages and then writes to the csv. It does scrape the elements and prints them in the console. Do you have to loop through and write to csv for each page straight away as it cannot store the data? I have tried adjusting my code to accommodate this but I can't seem to get it to work.

谢谢.

我也尝试了不同的方法，但是 https://www.pastebin .ca/3863340

I have also tried a different menthod but the same thing appears to be happening in https://www.pastebin.ca/3863340

from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import csv
    import requests
    import time
    from selenium import webdriver
    from random import shuffle
    import csv

driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()

driver.get('https://www.bookmaker.com.au/sports/soccer/')

SCROLL_PAUSE_TIME = 0.5


last_height = driver.execute_script("return document.body.scrollHeight")

while True:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


    time.sleep(SCROLL_PAUSE_TIME)


    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

time.sleep(1)

elements = driver.find_elements_by_css_selector(".market-match:nth-child(2) .market-group a , .market-match:nth-child(1) .market-group a")
elem_href1 = [element.get_attribute("href") for element in elements]
print(elem_href1)
print (len(elem_href1))
shuffle(elem_href1)
for link in elem_href1:
    driver.get(link)
    ...
    time.sleep(2)

    # link
    elems = driver.find_elements_by_css_selector("h3 a[Href*='/sports/soccer']")
    elem_href = []
    for elem in elems:
     print(elem.get_attribute("href"))
     elem_href.append(elem.get_attribute("href"))

    # TEAM
    langs = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
    langs_text = []

    for lang in langs:
        print(lang.text)
        langs_text.append(lang.text)

    time.sleep(0)

    # odds
    langs1 = driver.find_elements_by_css_selector("a.odds.quickbet")
    langs1_text = []

    for lang in langs1:
        print(lang.text)
        langs1_text.append(lang.text)

    time.sleep(0)

    with open('vtg12.csv', 'a', newline='') as outfile:
        writer = csv.writer(outfile)
        for row in zip(langs1_text, langs_text, elem_href):
            writer.writerow(row)

我的工作仅抓取最后一页，而不是全部抓取 [英] My job scrapes only the last page instead of all of them

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

我的工作仅抓取最后一页，而不是全部抓取 [英] My job scrapes only the last page instead of all of them

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭