Python 网页抓取列表从网页到文本文件 [英] Python Web Scraping List from Webpage to Text File

查看:25
本文介绍了Python 网页抓取列表从网页到文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大学三年级上过 Python 课程,但忘记了很多.对于工作,我被要求尝试找到一种方法来从网站上抓取某个日期.我有一个 python 文件,它对我使用的不同站点执行类似的操作.这是代码:

I took a Python class my junior year of college but have forgotten a lot. For work I was asked to try to find a way to web scrape some date from a website. I have a python file that does something similar for a different site I use. Here is that code:

from bs4 import BeautifulSoup
import io
import requests

soup = 
BeautifulSoup(requests.get("https://servicenet.dewalt.com/Parts/Search?searchedNumber=N365763").content)

rows = soup.select("#customerList tbody tr")
with io.open("data.txt", "w", encoding="utf-8") as f:
   f.write(u", ".join([row.select_one("td a").text for row in rows]))

这将获取该站点的电动工具零件的型号列表.现在我基本上想做同样的事情,但我不知道从哪里开始.该网站是 https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y

This gets a list of model numbers for power tool parts for that site. Now I basically want to do the same thing but I don't know where to begin. The site is https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y

您点击Where Used"按钮,然后有一个型号列表093412-0011-01"、093412-0011-02"等.我希望将这些号码发送到文本文件用逗号分隔,就像在我的第一个代码中一样(093412-0011-01、093412-0011-02,...")非常感谢任何帮助.谢谢!

You click on the "Where Used" button and then there is a list of model numbers "093412-0011-01", "093412-0011-02", etc. I want those numbers to be sent to a text file separated by commas just like in my first code ("093412-0011-01, 093412-0011-02,...") Any help is much appreciated. Thanks!

推荐答案

我使用 selenium 来导航页面.

I used selenium to be able to navigate pages.

代码:

import io
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Selenium Intializations
driver = webdriver.Chrome()
driver.get('https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y')
wait = WebDriverWait(driver, 30)
driver.maximize_window()

# Locating the "Where Used" Button
driver.find_element_by_xpath("//input[@id='aripartsSearch_whereUsedBtn_0'][@class='ariPartListWhereUsed ariImageOverride'][@title='Where Used']").click()
wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="ari_searchResults_Grid"]/ul')))


# Intializing BS4 and looking for the "Show More" Button
soup = BeautifulSoup(driver.page_source, "html.parser")
show = soup.find('li', {'class': 'ari-search-showMore'})

# Keep clicking the "Show More" Button until it is not visible anymore
while not show is None:
    time.sleep(2)
    hidden_element = driver.find_element_by_css_selector('#ari-showMore-unhide')
    if hidden_element.is_displayed():
        print("Element found")
        driver.find_element_by_css_selector('#ari-showMore-unhide').click()
        show = soup.find('li', {'class': 'ari-search-showMore'})
    else:
        print("Element not found")
        break

# Write the data parsed to the text file "data.txt"
with io.open("data.txt", "w", encoding="utf-8") as f:
    rows = soup.findAll('li', {'class': 'ari-ModelByPrompt'})
    for row in rows:
        part = str(row.text).replace(" ", "").replace("\n", "")
        print(part)
        f.write(part + ",")

输出:

Element found
Element found
Element found
Element not found
093412-0011-01
093412-0011-02
093412-0015-01
093412-0039-01
093412-0060-01
093412-0136-01
093412-0136-02
093412-0139-01
093412-0150-01
093412-0153-01
093412-0154-01
093412-0169-01
093412-0169-02
093412-0172-01
093412-0174-01
093412-0315-A1
093412-0339-A1
093412-0360-A1
093412-0636-A1
093412-0669-A1
093412-1015-E1
093412-1039-E1
093412-1060-E1
093412-1236-E1
093412-1236-E2
093412-1253-E1
093412-1254-E1
093412-1269-E1
093412-1274-E1
093412-1278-E1
093432-0035-01
093432-0035-02
093432-0035-03
093432-0036-01
093432-0036-03
093432-0036-04
093432-0037-01
093432-0038-01
093432-0038-03
093432-0041-01
093432-0140-01
093432-0145-01
093432-0149-01
093432-0152-01
093432-0157-01
093432-0158-01
093432-0160-01
093432-0192-B1
093432-0335-A1
093432-0336-A1
093432-0337-A1
093432-0338-A1
093432-1035-B1
093432-1035-E1
093432-1035-E2
093432-1035-E4
093432-1036-B1
093432-1036-E1
093432-1037-E1
093432-1038-B1
093432-1038-E1
093432-1240-B1
093432-1240-E1
093432-1257-E1
093432-1258-E1
093432-1280-B1
093432-1280-E1
093432-1281-B1
093432-1281-E1
093432-1282-B1
093432-1282-E1
093432-1286-B1
093452-0049-01
093452-0141-01
093452-0168-01
093452-0349-A1
093452-1049-B1
093452-1049-E1
093452-1049-E5
093452-1241-E1
093452-1242-E1
093452-1277-E1
093452-1283-B1
093452-1283-E1
09A412-0267-E1
09A413-0201-E1
09A413-0202-E1
09A413-0202-E2
09A413-0202-E3
09A413-0203-E1
09A413-0522-E1
09K432-0022-01
09K432-0023-01
09K432-0024-01
09K432-0115-01
09K432-0116-01
09K432-0116-02
09K432-0117-01
09K432-0118-01
120502-0015-E1

文件内容:

093412-0011-01,093412-0011-02,093412-0015-01,093412-0039-01,093412-0060-01,093412-0136-01,093412-0136-02,093412-0139-01,093412-0150-01,093412-0153-01,093412-0154-01,093412-0169-01,093412-0169-02,093412-0172-01,093412-0174-01,093412-0315-A1,093412-0339-A1,093412-0360-A1,093412-0636-A1,093412-0669-A1,093412-1015-E1,093412-1039-E1,093412-1060-E1,093412-1236-E1,093412-1236-E2,093412-1253-E1,093412-1254-E1,093412-1269-E1,093412-1274-E1,093412-1278-E1,093432-0035-01,093432-0035-02,093432-0035-03,093432-0036-01,093432-0036-03,093432-0036-04,093432-0037-01,093432-0038-01,093432-0038-03,093432-0041-01,093432-0140-01,093432-0145-01,093432-0149-01,093432-0152-01,093432-0157-01,093432-0158-01,093432-0160-01,093432-0192-B1,093432-0335-A1,093432-0336-A1,093432-0337-A1,093432-0338-A1,093432-1035-B1,093432-1035-E1,093432-1035-E2,093432-1035-E4,093432-1036-B1,093432-1036-E1,093432-1037-E1,093432-1038-B1,093432-1038-E1,093432-1240-B1,093432-1240-E1,093432-1257-E1,093432-1258-E1,093432-1280-B1,093432-1280-E1,093432-1281-B1,093432-1281-E1,093432-1282-B1,093432-1282-E1,093432-1286-B1,093452-0049-01,093452-0141-01,093452-0168-01,093452-0349-A1,093452-1049-B1,093452-1049-E1,093452-1049-E5,093452-1241-E1,093452-1242-E1,093452-1277-E1,093452-1283-B1,093452-1283-E1,09A412-0267-E1,09A413-0201-E1,09A413-0202-E1,09A413-0202-E2,09A413-0202-E3,09A413-0203-E1,09A413-0522-E1,09K432-0022-01,09K432-0023-01,09K432-0024-01,09K432-0115-01,09K432-0116-01,09K432-0116-02,09K432-0117-01,09K432-0118-01,120502-0015-E1,

这篇关于Python 网页抓取列表从网页到文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆