点击“下载csv"使用硒和美丽汤的按钮 [英] Click "Download csv" button using Selenium and Beautiful Soup

查看:48
本文介绍了点击“下载csv"使用硒和美丽汤的按钮的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从以下网站下载csv文件:

I'm trying to download the csv file from this website: https://invasions.si.edu/nbicdb/arrivals?state=AL&submit=Search+database&begin=2000-01-01&end=2020-11-11&type=General+Cargo&bwms=any

为此,我需要单击CSV按钮,该按钮将下载CSV文件.但是,我需要对多个链接执行此操作,这就是为什么我要使用Selenium来使单击链接的任务自动化的原因.

To do so, I need to click the CSV button, which downloads the CSV file. However, I need to do this for multiple links, which is why I want to use Selenium to automate the task of clicking on the link.

我当前正在运行的代码,但实际上并没有将csv文件下载到指定的文件夹(或与此相关的任何位置).

The code I have currently runs, but it does not actually download the csv file to the designated folder (or anywhere for that matter).

这是我当前拥有的代码:

Here is the code I currently have:

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time

options = webdriver.ChromeOptions() 
options.add_argument("download.default_directory=folder") # Set the download Path
driver = webdriver.Chrome(options=options)

url = 'https://invasions.si.edu/nbicdb/arrivals?state=AL&submit=Search+database&begin=2000-01-01&end=2020-11-11&type=General+Cargo&bwms=any'

driver.get(url)

python_button = driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "csvbutton", " " ))]')
python_button.click()

对此我将不胜感激!谢谢

I would appreciate any help with this! Thanks

推荐答案

您可以使用以下方式解决您的问题:

You can solve your problem using this way:

import requests
from bs4 import BeautifulSoup

# url of initial page with data
url = 'https://invasions.si.edu/nbicdb/arrivals?state=AL&submit=Search+database&begin=2000-01-01&end=2020-11-11&type=General+Cargo&bwms=any'
# name of csv file where to store downloaded csv data
csv_file_name = '/Users/eilyasov/Documents/arrivals_data.csv'

# get html content of initial page
html_data = requests.get(url=url) \
                    .content
# generate Beautifulsoup object based on hrml content of initial page
soup = BeautifulSoup(markup=html_data)
# extract url extension of downloadable csv file
csv_url_extension = soup.find(name='a', attrs={'class': 'csvbutton'}) \
                        .get(key='href')
# construct url of downloadable csv file
csv_url = 'https://invasions.si.edu' + csv_url_extension
# get content of downloadable csv file and saving it to file
response = requests.get(url=csv_url)
if response.status_code == 200:
    with open(csv_file_name, 'wb') as file:
        file.write(response.content)

这篇关于点击“下载csv"使用硒和美丽汤的按钮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆