下载嵌入式PDF文件 [英] Download Embedded PDF File

查看:385
本文介绍了下载嵌入式PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:如何下载嵌入Excel的PDF文件?

Question: How do I download a PDF file which is embedded in Excel?

这个问题已经问了很多遍了,但是我在任何地方都没有看到一个有效的答案.

This question has been asked so many times but I have not seen a single working answer anywhere.

因此,这是一种尝试自我回答问题的尝试.该代码有效,并且不依赖于不可靠的.Verb Verb:=xlPrimary方法.

So here is an attempt to self answer the question. This code works and is not dependent on the unreliable .Verb Verb:=xlPrimary method.

推荐答案

注意:这仅适用于pdf文件.如果混合使用嵌入式文件,则将无法正常工作.

Note: This will only work for pdf files. If there is a mix of embedded files then this will not work.

基本准备工作

  1. 假设我们的Excel文件C:\Users\routs\Desktop\Sample.xlsx嵌入了2个Pdf文件,如下所示.

  1. Let's say our Excel File C:\Users\routs\Desktop\Sample.xlsx has 2 Pdf Files embedded as shown below.

出于测试目的,我们将在桌面C:\Users\routs\Desktop\Temp上创建一个临时文件夹.

For testing purpose, we will create a temp folder on our desktop C:\Users\routs\Desktop\Temp.

逻辑:

  1. Excel文件本质上只是一个.zip文件
  2. Excel将oleObjects保存在\xl\embeddings\文件夹中.如果将Excel文件重命名为zip并用Winzip格式打开,则可以看到以下内容

  1. The Excel file is essentially just a .zip file
  2. Excel saves the oleObjects in the \xl\embeddings\ folder. If you rename the Excel file to zip and open it in say Winzip, you can see the following

如果解压缩bin文件并将其重命名为pdf,则可以在Microsoft Edge中打开pdf,但不能在任何其他pdf查看器中打开.为了使其与任何其他pdf查看器兼容,我们将必须进行一些Binary读取和编辑.

If you extract the bin files and rename it to pdf then you will be able to open the pdf in Microsoft Edge but not in any other pdf viewer. To make it compatible with any other pdf viewer, we will have to do some Binary reading and editing.

如果在任何十六进制编辑器中打开bin文件,则将看到以下内容.我使用了在线十六进制编辑器 https://hexed.it/

If you open the bin file in any Hex Editor, you will see the below. I used the online hex editor https://hexed.it/

我们必须删除%PDF

我们将尝试找到%PDF ...的8位无符号值,或更具体地说,找到%PDF

We will try and find the 8 bit unsigned values of %PDF... Or more specifically of %, P, D and F

如果您在十六进制编辑器中向下滚动,将获得这四个值

If you scroll down in the hex editor, you will get those four values

% 的值

Value of %

P 的值

Value of P

D 的值

Value of D

F 的值

Value of F

现在我们要做的就是读取二进制文件并删除%PDF之前的所有内容,并以.Pdf扩展名保存文件.

Now all we have to do is read the binary file and delete everything before %PDF and save the file with a .Pdf extention.

代码:

Option Explicit

Const TmpPath As String = "C:\Users\routs\Desktop\Temp"
Const ExcelFile As String = "C:\Users\routs\Desktop\Sample.xlsx"
Const ZipName As String = "C:\Users\routs\Desktop\Sample.zip"

Sub ExtractPDF()
    Dim tmpPdf As String
    Dim oApp As Object
    Dim i As Long

    '~~> Deleting any previously created files. This is
    '~~> usually helpful from 2nd run onwards
    On Error Resume Next
    Kill ZipName
    Kill TmpPath & "\*.*"
    On Error GoTo 0

    '~~> Copy and rename the Excel file as zip file
    FileCopy ExcelFile, ZipName

    Set oApp = CreateObject("Shell.Application")

    '~~> Extract the bin file from xl\embeddings\
    For i = 1 To oApp.Namespace(ZipName).items.Count
        oApp.Namespace(TmpPath).CopyHere oApp.Namespace(ZipName).items.Item("xl\embeddings\oleObject" & i & ".bin")

        tmpPdf = TmpPath & "\oleObject" & i & ".bin"

        '~~> Read and Edit the Bin File
        If Dir(tmpPdf) <> "" Then ReadAndWriteExtractedBinFile tmpPdf
    Next i

    MsgBox "Done"
End Sub

'~~> Read and ReWrite Bin File
Sub ReadAndWriteExtractedBinFile(s As String)
    Dim intFileNum As Long, bytTemp As Byte
    Dim MyAr() As Long, NewAr() As Long
    Dim fileName As String
    Dim i As Long, j As Long, k As Long

    j = 1

    intFileNum = FreeFile

    '~~> Open the bing file
    Open s For Binary Access Read As intFileNum
    '~~> Get the number of lines in the bin file
    Do While Not EOF(intFileNum)
        Get intFileNum, , bytTemp
        j = j + 1
    Loop

    '~~> Create an array to store the filtered results of the bin file
    '~~> We will use this to recreate the bin file
    ReDim MyAr(1 To j)
    j = 1

    '~~> Go to first record
    If EOF(intFileNum) Then Seek intFileNum, 1

    '~~> Store the contents of bin file in an array
    Do While Not EOF(intFileNum)
        Get intFileNum, , bytTemp
        MyAr(j) = bytTemp
        j = j + 1
    Loop
    Close intFileNum

    '~~> Check for the #PDF and Filter out rest of the data
    For i = LBound(MyAr) To UBound(MyAr)
        If i = UBound(MyAr) - 4 Then Exit For
        If Val(MyAr(i)) = 37 And Val(MyAr(i + 1)) = 80 And _
        Val(MyAr(i + 2)) = 68 And Val(MyAr(i + 3)) = 70 Then
            ReDim NewAr(1 To j - i + 2)

            k = 1
            For j = i To UBound(MyAr)
                NewAr(k) = MyAr(j)
                k = k + 1
            Next j

            Exit For
        End If
    Next i

    intFileNum = FreeFile

    '~~> Decide on the new name of the pdf file
    '~~> Format(Now, "ddmmyyhhmmss")  This method will awlays ensure that
    '~~> you will get a unique filename
    fileName = TmpPath & "\" & Format(Now, "ddmmyyhhmmss") & ".pdf"

    '~~> Write the new binary file
    Open fileName For Binary Lock Read Write As #intFileNum
    For i = LBound(NewAr) To UBound(NewAr)
        Put #intFileNum, , CByte(NewAr(i))
    Next i

    Close #intFileNum
End Sub

输出

这篇关于下载嵌入式PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆