Python 下载嵌入在页面中的 PDF [英] Python Download PDF Embedded in a Page

查看：57 发布时间：2021/12/17 14:00:45 python pdf web-scraping

本文介绍了Python 下载嵌入在页面中的 PDF的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这个链接:

<小时>

看一下具体简介:

profile = {"plugins.plugins_list": [{"enabled": False,"name": "Chrome PDF 查看器"}],download.default_directory":下载文件夹，"download.extensions_to_open": ""}

它禁用 Chrome PDF Viewer 插件(在网页上嵌入 pdf)，将默认下载文件夹设置为 download_folder 变量中定义的文件夹并设置 Chrome不允许自动打开任何扩展.

之后，当您打开所谓的内部链接"时，您的网络驱动程序会自动将 .pdf 文件下载到 download_folder.

I have this link:

http://www.equibase.com/premium/chartEmb.cfm?track=ALB&raceDate=06/17/2002&cy=USA&rn=1

I want to download the embedded PDF.

I have tried the normal methods of urllib and request but they're not working.

import urllib2

url = "http://www.equibase.com/premium/chartEmb.cfm?track=ALB&raceDate=06/17/2002&cy=USA&rn=1"
response = urllib2.urlopen(url)
file = open("document.pdf", 'wb')
file.write(response.read())
file.close()

Moreover, I have also tried to find the original link of the pdf but it also did not work.

Internal link:

http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=ALB&CTRY=USA&DT=06/17/2002&DAY=D&STYLE=EQB&eqbPDFChartPlus.pdf

解决方案

Using Selenium with a specific ChromeProfile you can download embedded pdfs using the following code:

Code:

def download_pdf(lnk):

    from selenium import webdriver
    from time import sleep

    options = webdriver.ChromeOptions()

    download_folder = "C:\"    

    profile = {"plugins.plugins_list": [{"enabled": False,
                                         "name": "Chrome PDF Viewer"}],
               "download.default_directory": download_folder,
               "download.extensions_to_open": ""}

    options.add_experimental_option("prefs", profile)

    print("Downloading file from link: {}".format(lnk))

    driver = webdriver.Chrome(chrome_options = options)
    driver.get(lnk)

    filename = lnk.split("/")[4].split(".cfm")[0]
    print("File: {}".format(filename))

    print("Status: Download Complete.")
    print("Folder: {}".format(download_folder))

    driver.close()

And when I call this function:

download_pdf("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=ALB&CTRY=USA&DT=06/17/2002&DAY=D&STYLE=EQB")

Thats the output:

>>> Downloading file from link: http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=1&BorP=P&TID=ALB&CTRY=USA&DT=06/17/2002&DAY=D&STYLE=EQB
>>> File: eqbPDFChartPlus
>>> Status: Download Complete.
>>> Folder: C:

Take a look at the specific profile:

profile = {"plugins.plugins_list": [{"enabled": False,
                                     "name": "Chrome PDF Viewer"}],
           "download.default_directory": download_folder,
           "download.extensions_to_open": ""}

It disables the Chrome PDF Viewer plugin (that embedds the pdf at the webpage), set the default download folder to the folder defined at download_folder variable and sets that Chrome isn't allowed to open any extensions automatically.

After that, when you open the so called "Internal link" your webdriver will automatically download the .pdf file to the download_folder.

这篇关于Python 下载嵌入在页面中的 PDF的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 下载嵌入在页面中的 PDF [英] Python Download PDF Embedded in a Page

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 下载嵌入在页面中的 PDF [英] Python Download PDF Embedded in a Page

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭