我如何使用Selenium和Chrome下载某些内容? [英] How can I download something with Selenium and Chrome?

查看:359
本文介绍了我如何使用Selenium和Chrome下载某些内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第一步,我尝试设置默认的下载文件夹。



我尝试了5个选项,但它们都没有工作:

 #!/ usr / bin / env python 
# - * - coding:utf-8 - * -

Selenium示例用于下载网页。来自selenium的

导入webdriver
来自selenium.webdriver.common.keys从selenium.webdriver.common.action_chains导入关键字
导入ActionChains
导入os
导入时间


def main():
下载打开的PDF页面。
browser = get_browser()
url =https://martin-thoma.com/pdf/cv-curriculum-vitae.pdf
browser.get(url)#打开PDF页面
$ el = browser.find_element_by_id(plugin)
time.sleep(5)
ActionChains(browser).send_keys(Keys.CONTROL,s)。perform()
print(browser.current_url)
time.sleep(60)#让浏览器开启60s

$ b $ def get_browser():
获取浏览器(一个司机)。
#找到'哪个chromedriver'的路径
path_to_chromedriver =('/ home / moose / GitHub / algorithms / scraping /'
'venv / bin / chromedriver')
download_dir =/ home / moose / selenium-download /
print(Is directory:{}。format(os.path.isdir(download_dir)))

fail = 6
options = None
desired_caps = None
if fail == 1:
#失败(1)
os.environ ['XDG_DOWNLOAD_DIR'] = download_dir
elif fail == 2:
#Fail(2)
options = webdriver.ChromeOptions()
options.add_argument(download.default_directory = {)
.format( )
elif fail == 3:
#失败(3)
选项= webdriver.ChromeOptions()
prefs = {download.default_directory:download_dir}
options.add_experimental_option(prefs,prefs)
elif fail == 4:
#F ail(4)
desired_caps = {'prefs':
{'download':{'default_directory':download_dir,
'directory_upgrade':true,
'extensions_to_open' :}}}
elif fail == 5:
#失败(5)
desired_caps = {'prefs':
{'download.default_directory':download_dir}}
elif fail == 6:
#失败(6)
desired_caps = {'prefs':
{'download':{'default_directory':download_dir,
'directory_upgrade':True,
'extensions_to_open':}}}

browser = webdriver.Chrome(executable_path = path_to_chromedriver,
chrome_options = options,
desired_capabilities = desired_caps)
返回浏览器


if __name__ =='__main__':
main()

我知道有一些简单的方法可以通过URL下载PDF。但是,我真正的用例更加复杂,下载是由JavaScript生成的点击触发的,这是一个3步登录过程背后的链接,完全由JavaScript完成。



所以这个问题有两个方面:


  1. 如何使用Selenium和Chrome(在Ubuntu 16.04上)更改默认下载目录? li>
  2. 如何下载打开的PDF? (我尝试了一个操作链,但它不起作用)

我有 Google Chrome版本59.0。 3071.115(官方版本)(64位),通过pip安装程序下载。 您需要从selenium.webdriver.chrome.options中导入选项



然后在 get_browser()中将整个if块和浏览器初始化更改为:

 chrome_options = Options()
chrome_options.add_experimental_option('prefs',{
plugins.plugins_list:[{enabled:False,name: Chrome PDF Viewer}],
download:{
prompt_for_download:False,
default_directory:download_dir
}
})

browser = webdriver.Chrome(path_to_chromedriver,chrome_options = chrome_options)

(我使用Windows,但是不应该有任何分歧。)


As a first step, I tried to set the default download folder.

I tried 5 options but none of them worked:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Selenium example for downloading a webpage."""

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import os
import time


def main():
    """Download an opened PDF page."""
    browser = get_browser()
    url = "https://martin-thoma.com/pdf/cv-curriculum-vitae.pdf"
    browser.get(url)  # Open a PDF page
    # el = browser.find_element_by_id("plugin")
    time.sleep(5)
    ActionChains(browser).send_keys(Keys.CONTROL, "s").perform()
    print(browser.current_url)
    time.sleep(60)  # Keep the browser open for 60s


def get_browser():
    """Get the browser (a "driver")."""
    # find the path with 'which chromedriver'
    path_to_chromedriver = ('/home/moose/GitHub/algorithms/scraping/'
                            'venv/bin/chromedriver')
    download_dir = "/home/moose/selenium-download/"
    print("Is directory: {}".format(os.path.isdir(download_dir)))

    fail = 6
    options = None
    desired_caps = None
    if fail == 1:
        # Fail (1)
        os.environ['XDG_DOWNLOAD_DIR'] = download_dir
    elif fail == 2:
        # Fail (2)
        options = webdriver.ChromeOptions()
        options.add_argument("download.default_directory={}"
                             .format(download_dir))
    elif fail == 3:
        # Fail (3)
        options = webdriver.ChromeOptions()
        prefs = {"download.default_directory": download_dir}
        options.add_experimental_option("prefs", prefs)
    elif fail == 4:
        # Fail (4)
        desired_caps = {'prefs':
                        {'download': {'default_directory': download_dir,
                                      'directory_upgrade': "true",
                                      'extensions_to_open': ""}}}
    elif fail == 5:
        # Fail (5)
        desired_caps = {'prefs':
                        {'download.default_directory': download_dir}}
    elif fail == 6:
        # Fail (6)
        desired_caps = {'prefs':
                        {'download': {'default_directory': download_dir,
                                      'directory_upgrade': True,
                                      'extensions_to_open': ""}}}

    browser = webdriver.Chrome(executable_path=path_to_chromedriver,
                               chrome_options=options,
                               desired_capabilities=desired_caps)
    return browser


if __name__ == '__main__':
    main()

I know there are simpler ways to download a PDF by URL. However, my real usecase is much more complicated and the download is triggered by a javascript generated click on a link behind a 3-step login process which is purely done with JavaScript.

So this question has two aspects:

  1. How do I change the default download directory with Selenium and Chrome (on Ubuntu 16.04)?
  2. How do I download an opened PDF? (I tried an action chain, but it doesn't work)

I have Google Chrome Version 59.0.3071.115 (Official Build) (64-bit), downloaded via the pip installer.

解决方案

First you need

from selenium.webdriver.chrome.options import Options

And change the whole if block and the browser initialization in get_browser() to this:

chrome_options = Options()
chrome_options.add_experimental_option('prefs', {
    "plugins.plugins_list": [{"enabled":False,"name":"Chrome PDF Viewer"}],
    "download": {
        "prompt_for_download": False,
        "default_directory"  : download_dir
    }
})

browser = webdriver.Chrome(path_to_chromedriver, chrome_options=chrome_options)

(I use Windows but there shouldn't be any differences.)

这篇关于我如何使用Selenium和Chrome下载某些内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆