VBA-IE自动化-另存为PDF无法正常工作 [英] VBA - IE Automation - save as PDF isn't working

查看:122
本文介绍了VBA-IE自动化-另存为PDF无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于VBA中的IE自动化,我试图从网站上自动下载PDF(职位发布),由于某种原因,我无法生成单个PDF.

I'm trying to automatically download PDFs from a website (job posting) thanks to a IE automation in VBA and for some reason I don't manage to generate a single PDF.

通过在网页上手动执行操作,并在pdf图标上执行目标另存为"操作,效果很好,可以给我有效的PDF,但是自动化失败.

Doing it manualy by going on the web page and doing a 'save target as' on the pdf icon works fine and gives me a valid PDF but the automation fails.

我不明白为什么,希望有人能够给我提示.

I don't understand why and hope someone will be able to give me a hint.

谢谢

VeeBee

请在此后找到我到目前为止的代码(URL是公开的,我已经随机选择了报价)

Please find hereafter the code I have so far (the URLs are public and I've picked up offers at random)

Private Declare Function DownloadFilefromURL Lib "urlmon" _
Alias "URLDownloadToFileA" _
(ByVal pCaller As Long, _
ByVal szURL As String, _
ByVal szFileName As String, _
ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long

Private Const ERROR_SUCCESS As Long = 0
Private Const BINDF_GETNEWESTVERSION As Long = &H10


Public Function DownloadFile(SourceUrl As String, LocalFile As String) As Boolean
    DownloadFile = DownloadFilefromURL(0&, SourceUrl, LocalFile, BINDF_GETNEWESTVERSION, 0&) = ERROR_SUCCESS
End Function


Sub TestSavePDF()
    Dim oNav As SHDocVw.InternetExplorer
    Dim oDoc As MSHTML.HTMLDocument
    Dim MyURL As String

    Set oNav = New SHDocVw.InternetExplorer
    oNav.Visible = True
    'Test Altays Client A (Banque de France)
    MyURL = "https://www.recrutement.banque-france.fr/detail-offre/?NoSource=16001&NoSociete=167&NoOffre=2036788&NoLangue=1"
    'Test Altays Client B (Egis)
    '        MyURL = "https://www.altays-progiciels.com/clicnjob/FicheOffreCand.php?PageCour=1&Liste=Oui&Autonome=0&NoOffre=2037501&RefOffrel=&NoFaml=0&NoParam1l=0&NoParam2l=0&NoParam3l=0&NoParam133l=0&NoParam134l=0&NoParam136l=0&NoEntite1=0&NoEntite=&NoPaysl=0&NoRegionl=0&NoDepartementl=0&NoTableOffreLieePl=0&NoTableOffreLieeFl=0&NoNivEtl=0&NoTableCCl=0&NoTableCC2l=0&NoTableCC3l=0&NoTableOffreUnl=0&NoTypContratl=0&NoTypContratProl=0&NoStatutOffrel=&NoUtilisateurl=&RechPleinTextel=#ancre3"


    oNav.navigate MyURL
    'link provided to download the job offer in PDF. when clicked the PDF opens in a new tab
    MyURL = "https://www.altays-progiciels.com/clicnjob/ExportPDFFront.php"

    DownloadFile MyURL, "C:\[...Path...]\test.pdf"

End Sub

推荐答案

影子DOM和无效的链接生成:

初始作业页面自动单击目标href不会生成可行的页面链接.大概是因为重要的事情实际上发生在服务器端.

Shadow DOM and invalid link generation:

The initial job page automated clicking on the target href doesn't generate a viable page link. This is presumably because the important stuff actually happens server side.

目标href:

您可以单击此页面上的实际下载按钮

You can click the actual download button on this page

下载按钮:

这将打开一个新窗口,这就是Selenium很棒的原因. Selenium具有切换到此新窗口的方法.否则,您可以使用答案中稍后详细介绍的FindWindow方法来查找Save As窗口.

This launches a new window which is why Selenium is great. Selenium has methods to switch to this new Window. Otherwise, you can use the FindWindow methods I detail later in the answer for finding the Save As window.

在此新窗口中,由于无法通过DOM获得所需的内容,因此您无法像平常那样按常规方式与按钮进行交互.如果仔细检查,您会发现pdf按钮位于 shadow-root ,即您无法访问的地方.这是一种设计选择.我确实需要在某个时候调查这种可能性(使用'/deep/'组合器通过阴影DOM选择),但是我不要认为它在VBA中是正确的.

In this new window you cannot interact with the buttons in the way you can normally when scraping as the required content is not available via the DOM. If you examine closely you will see the pdf button is in shadow-root i.e. where you cannot access. This is a design choice. I do need to investigate this possibility (selecting through the shadow DOM using '/deep/' combinator) at some point but I don't think it holds true in VBA.

下载按钮:

我正在使用基本的硒 VBA包装器和API模仿使用Save As窗口在屏幕上保存为pdf的操作(请参见底部的图像).特别是通过SendKeys使用Save键盘快捷键.这行得通. 我使用Spy++来检查Window树形结构并检查Window Class名称和Titles.

I am using selenium basic VBA wrapper and APIs to mimic the actions on screen to save as pdf using the Save As Window (see image at very bottom) . Particularly making use of Save keyboard shortcut via SendKeys. This works. I used Spy++ to check the Window tree structure and check Window Class names and Titles.

我使用SendKeys来自动打开pdf的Save As对话框.然后,我下降Window树形结构,以在输入文件名的ComboBox上获取句柄,因此可以向其中发送 message 即文件名,并在Save按钮上,以便单击它.您可能需要更长的等待时间,以确保下载正确进行.在我看来,这有点不足,我希望有所改进.

I use SendKeys to automate the opening of the Save As dialog for the pdf. I then descend the Window tree structure to get handles on the ComboBox where the file name is entered, so I can send a message i.e. file name to it, and on the Save button so I can click it. You may need a longer wait to ensure download goes through correctly. This bit is a little buggy in my opinion and I hope to improve.

通过间谍++

Window Structure via Spy++

它相当健壮.我使用Selenium Basic来简化与iframe的合作,并简化了相同的原产地政策问题.使用IE,您不能简单地获取iframe的src链接并愉快地导航至原始添加的pdf打印页面.我相信您可以做的是发出初始 XMLHTTP请求并抓取src属性值,即链接.然后将src链接传递到IE,然后对Windows处理部分进行如下所示的操作.

It is fairly robust. I used Selenium Basic for the ease of working with iframes and getting round same origin policy problems. With IE you cannot simply grab the src link of the iframe and happily navigate onto the page for the pdf print from the original add. What you can do, I believe, is issue an initial XMLHTTP request and grab the src attribute value i.e. link. Then pass that src link to IE and then carry on as shown below for the Windows handling parts.

与更多的时间相比,我可以添加IE版本,并且比起显式等待时间,我将寻找一种更可靠的方法,用于在退出IE实例之前监视文件下载.可能遵循

With more time I could add the IE version in and will look at a more robust method, than explicit wait time adding, for monitoring for file download before quitting the IE instance. Likely along the lines of this (As stated in one of the answers: Use SetWindowsHookEx to set up a WH_SHELL hook and look for the HSHELL_WINDOWCREATED event.)

注释:

  1. 这是为64位编写的. 32位删除PtrSafe.您可以将LongPtr切换为Long,但我认为它仍然兼容.
  2. 非常感谢@ErikvonAsmuth在与我一起使用API​​方面的巨大耐心.看看他在 Windows 上的出色回答.
  1. This is written for 64 bit. 32 Bit remove PtrSafe. You could switch LongPtr for Longbut I think it remains compatible.
  2. Huge thanks to @ErikvonAsmuth for his enormous patience in going through the APIs with me. Take a look at his excellent answer here for working with Windows.


VBA:


VBA:

Option Explicit

Declare PtrSafe Function SendMessageW Lib "User32" (ByVal hWnd As LongPtr, ByVal wMsg As LongPtr, ByVal wParam As LongPtr, ByVal lParam As LongPtr) As LongPtr

Declare PtrSafe Function FindWindowExW Lib "User32" (ByVal hWndParent As LongPtr, _
                                                     Optional ByVal hwndChildAfter As LongPtr, Optional ByVal lpszClass As LongPtr, _
                                                     Optional ByVal lpszWindow As LongPtr) As LongPtr

Public Declare PtrSafe Function FindWindowW Lib "User32" (ByVal lpClassName As LongPtr, Optional ByVal lpWindowName As LongPtr) As LongPtr

Public Const WM_SETTEXT = &HC
Public Const BM_CLICK = &HF5

Public Sub GetInfo()
    Dim d As WebDriver, keys As New Selenium.keys
    Const MAX_WAIT_SEC As Long = 5
    Dim t As Date

    Set d = New ChromeDriver
    Const URL = "https://www.recrutement.banque-france.fr/detail-offre/charge-de-recrutement-confirme-h-f-2037343/"
    With d
        .start "Chrome"
        .get URL
        .SwitchToFrame .FindElementById("altiframe")
        .FindElementById("btn-pdf").Click
        .SwitchToNextWindow
        .SendKeys keys.Control, "s"

        Dim str1 As String, cls As String, name As String
        Dim ptrSaveWindow As LongPtr

        str1 = "#32770" & vbNullChar

        t = Timer
        Do
            DoEvents
            ptrSaveWindow = FindWindowW(StrPtr(str1))
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop While ptrSaveWindow = 0

        Dim duiViewWND As LongPtr, directUIHWND As LongPtr
        Dim floatNotifySinkHWND As LongPtr, comboBoxHWND As LongPtr, editHWND As LongPtr


        If Not ptrSaveWindow > 0 Then Exit Sub

        duiViewWND = FindWindowExW(ptrSaveWindow, 0&)

        If Not duiViewWND > 0 Then Exit Sub

        directUIHWND = FindWindowExW(duiViewWND, 0&)

        If Not directUIHWND > 0 Then Exit Sub

        floatNotifySinkHWND = FindWindowExW(directUIHWND, 0&)

        If Not floatNotifySinkHWND > 0 Then Exit Sub

        comboBoxHWND = FindWindowExW(floatNotifySinkHWND, 0&)

        If Not comboBoxHWND > 0 Then Exit Sub

        editHWND = FindWindowExW(comboBoxHWND, 0&)

        If Not editHWND > 0 Then Exit Sub

        Dim msg As String
        msg = "myTest.pdf" & vbNullChar

        SendMessageW editHWND, WM_SETTEXT, 0, StrPtr(msg)

        .SendKeys keys.Control, "s"

        Dim ptrSaveButton As LongPtr
        cls = "Button" & vbNullChar
        name = "&Save" & vbNullChar

        ptrSaveButton = FindWindowExW(ptrSaveWindow, 0, StrPtr(cls), StrPtr(name))

        SendMessageW ptrSaveButton, BM_CLICK, 0, 0

        Application.Wait Now + TimeSerial(0, 0, 4)

        .Quit
    End With
End Sub


另存为对话框窗口:

参考:

  1. 阴影DOM
  2. 使用影子DOM-开发人员Mozilla页面.
  1. Shadow DOM
  2. Using shadow DOM - Developer Mozilla pages.


项目参考:

  1. 硒类型库

  1. Selenium Type Library

`

这篇关于VBA-IE自动化-另存为PDF无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆