无法使用Selenium WebDriver + FireFox下载PDF [英] Can't download PDF with selenium webdriver + firefox

查看:90
本文介绍了无法使用Selenium WebDriver + FireFox下载PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个硒脚本,作为执行的一部分,需要下载PDF,而下载是必需的,因为稍后将使用PDF.我已经使用配置文件首选项方法来下载文件,并且在我用于开发的虚拟机上运行良好,但是当将脚本移至实时服务器时,它似乎并不想下载所需的PDF.完全没有.这是我用来设置Firefox配置文件的行:

I have a selenium script that as part of it's execution needs to download a PDF, and the download is necessary as the PDF is used later on. I have used the profile preferences method to get the file to download, and this has been working fine on the virtual machine I have used for development, however when moving the script to the live server it does not seem to want to download the required PDF at all. Here are the lines I have used to set up the firefox profile:

fxProfile = webdriver.FirefoxProfile()
fxProfile.set_preference("browser.download.folderList",2)
fxProfile.set_preference("browser.download.manager.showWhenStarting",False)
fxProfile.set_preference("browser.download.dir",foldername)
fxProfile.set_preference("browser.helperApps.neverAsk.saveToDisk","application/pdf")
fxProfile.set_preference("pdfjs.disabled",True)
fxProfile.set_preference("plugin.scan.Acrobat", "99.0");
fxProfile.set_preference("plugin.scan.plid.all", False);
fxProfile.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
fxProfile.set_preference("browser.helperApps.alwaysAsk.force", False);
driver = webdriver.Firefox(firefox_profile=fxProfile)

在虚拟机上,首选项行以禁用pdfjs结束,并且工作正常,在那之后,我尝试了一些额外的行,试图在运行机上解决问题.

On the virtual machine the preferences lines ended at disabling pdfjs and this worked fine, after that is extra lines I have tried to solve the problem on the live machine.

变量foldername是正确的,因为使用了相同的变量来打开和写入功能正常的日志失败.据我所知,可以告诉OS级别的窗口确认下载未打开,因为在单击下载链接后,我仍然可以指示脚本单击网站的其他部分.我还要确保我给脚本足够的时间来下载文件(在有线连接上下载30MB秒以内的1MB以下PDF应该足够了.)

The variable foldername is correct as the same variable is used to open and write to a log fail which functions fine. As far as I can tell an OS level window to confirm the download is not being opened as I can still direct the script to click on other parts of the site after the download link has been clicked. I am also making sure I give the script enough time to download the file (30+ seconds to download a sub 1mb PDF on a wired connection should be more than enough).

问题在于,运行中的机器是一台服务器,因此没有物理屏幕让我确切地看到正在发生的事情,这使得修复起来非常困难.再次,它可以在我的虚拟机上正常工作,在这里我可以看到正在发生的事情,但是每次都无法在实时服务器上下载PDF,而不会引发任何类型的错误.

The problem is the live machine is a server and as such has no physical screen for me to see exactly what's happening, making this much harder to fix. Again, it works fine on my virtual machine where I can see what's happening, but fails to download the PDF every single time on the live server, without throwing any sort of error.

推荐答案

我解决了此问题,方法是将硒会话传递给Python请求库,然后从那里获取PDF.我在这个StackOverflow答案,但这是一个简单的示例:

I solved this problem by passing the selenium session to the Python requests library and then fetching the PDF from there. I have a longer writeup in this StackOverflow answer, but here's a quick example:

import requests
from selenium import webdriver

pdf_url = "/url/to/some/file.pdf"

# setup webdriver with options 
driver = webdriver.Firefox(..options)

# do whatever you need to do to auth/login/click/etc.

# navigate to the PDF URL in case the PDF link issues a 
# redirect because requests.session() does not persist cookies
driver.get(pdf_url)

# get the URL from Selenium 
current_pdf_url = driver.current_url

# create a requests session
session = requests.session()

# add Selenium's cookies to requests
selenium_cookies = driver.get_cookies()
for cookie in selenium_cookies:
    session.cookies.set(cookie["name"], cookie["value"])

# Note: If headers are also important, you'll need to use 
# something like seleniumwire to get the headers from Selenium 

# Finally, re-send the request with requests.session
pdf_response = session.get(current_pdf_url)

# access the bytes response from the session
pdf_bytes = pdf_response.content

这篇关于无法使用Selenium WebDriver + FireFox下载PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆