如何使用硒从网页下载嵌入的 PDF? [英] How to download embedded PDF from webpage using selenium?

查看:54
本文介绍了如何使用硒从网页下载嵌入的 PDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想像这张图片一样使用 selenium 从网页下载嵌入的 PDF.嵌入式 PDF 图像

I want to download embedded PDF from a webpage using selenium just like in this image. Embedded PDF image

例如,这样的页面:https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entity-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html

我尝试了下面提到的代码,但没有成功.

I tried the code mentioned below but it did not work out.

def download_pdf(lnk):

    from selenium import webdriver
    from time import sleep

    options = webdriver.ChromeOptions()

    download_folder = "/*My folder*/"    

    profile = {"plugins.plugins_list": [{"enabled": False,
                                         "name": "Chrome PDF Viewer"}],
               "download.default_directory": download_folder,
               "download.extensions_to_open": ""}

    options.add_experimental_option("prefs", profile)

    print("Downloading file from link: {}".format(lnk))

    driver = webdriver.Chrome('/*Path of chromedriver*/',chrome_options = options)
    driver.get(lnk)
    imp_by1 = driver.find_element_by_id("secondaryToolbarToggle")
    imp_by1.click()
    imp_by = driver.find_element_by_id("secondaryDownload")
    imp_by.click()

    print("Status: Download Complete.")

    driver.close()

download_pdf('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

感谢任何帮助.

提前致谢!!

推荐答案

开始吧,代码中的描述:

Here You go, description in code:

=^..^=

from selenium import webdriver
import os

# initialise browser
browser = webdriver.Chrome(os.getcwd()+'/chromedriver')
# load page with iframe
browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

# find pdf url
pdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")
# load page with pdf
browser.get(pdf_url)
# download file
download = browser.find_element_by_xpath('//*[@id="download"]')
download.click()

这篇关于如何使用硒从网页下载嵌入的 PDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆