从 PDF 中提取数据并添加到工作表 [英] Extract Data from PDF and Add to Worksheet

查看:37
本文介绍了从 PDF 中提取数据并添加到工作表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 PDF 文档中的数据提取到工作表中.PDF 显示和文本可以手动复制并粘贴到 Excel 文档中.

I am trying to extract the data from a PDF document into a worksheet. The PDFs show and text can be manually copied and pasted into the Excel document.

我目前正在通过 SendKeys 执行此操作,但它不起作用.当我尝试粘贴 PDF 文档中的数据时出现错误.为什么我的粘贴不起作用?如果我在宏停止运行后粘贴,它会正常粘贴.

I am currently doing this through SendKeys and it is not working. I get an error when I try to paste the data from the PDF document. Why is my paste not working? If I paste after the macro has stopped running it pastes as normal.

Dim myPath As String, myExt As String
Dim ws As Worksheet
Dim openPDF As Object
'Dim pasteData As MSForms.DataObject
Dim fCell As Range

'Set pasteData = New MSForms.DataObject
Set ws = Sheets("DATA")
If ws.Cells(ws.Rows.Count, "A").End(xlUp).Row > 1 Then Range("A3:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row).ClearContents

myExt = "*.pdf"
'When Scan Receipts Button Pressed Scan the selected folder/s for receipts
For Each fCell In Range(ws.Cells(1, 1), ws.Cells(1, ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column))
    myPath = Dir(fCell.Value & myExt)
    Do While myPath <> ""
        myPath = fCell.Value & "" & myPath
        Set openPDF = CreateObject("Shell.Application")
        openPDF.Open (myPath)
        Application.Wait Now + TimeValue("00:00:2")
        SendKeys "^a"
        Application.Wait Now + TimeValue("00:00:2")
        SendKeys "^c"
        'Application.Wait Now + TimeValue("00:00:2")
        ws.Select
        ActiveSheet.Paste
        'pasteData.GetFromClipboard

        'ws.Cells(3, 1) = pasteData.GetText
        Exit Sub

        myPath = Dir
    Loop

Next fCell

推荐答案

您可以打开 PDF 文件并使用 Adob​​e 库提取其内容(我相信您可以从 Adob​​e 作为 SDK 的一部分下载,但它附带某些版本的 Acrobat 也是如此)

You can open the PDF file and extract its contents using the Adobe library (which I believe you can download from Adobe as part of the SDK, but it comes with certain versions of Acrobat as well)

确保也将库添加到您的参考文献中(在我的机器上它是 Adob​​e Acrobat 10.0 类型库,但不确定那是否是最新版本)

Make sure to add the Library to your references too (On my machine it is the Adobe Acrobat 10.0 Type Library, but not sure if that is the newest version)

即使使用 Adob​​e 库,它也不是微不足道的(您需要添加自己的错误捕获等):

Even with the Adobe library it is not trivial (you'll need to add your own error-trapping etc):

Function getTextFromPDF(ByVal strFilename As String) As String
   Dim objAVDoc As New AcroAVDoc
   Dim objPDDoc As New AcroPDDoc
   Dim objPage As AcroPDPage
   Dim objSelection As AcroPDTextSelect
   Dim objHighlight As AcroHiliteList
   Dim pageNum As Long
   Dim strText As String

   strText = ""
   If (objAvDoc.Open(strFilename, "") Then
      Set objPDDoc = objAVDoc.GetPDDoc
      For pageNum = 0 To objPDDoc.GetNumPages() - 1
         Set objPage = objPDDoc.AcquirePage(pageNum)
         Set objHighlight = New AcroHiliteList
         objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
         Set objSelection = objPage.CreatePageHilite(objHighlight)

         If Not objSelection Is Nothing Then
            For tCount = 0 To objSelection.GetNumText - 1
               strText = strText & objSelection.GetText(tCount)
            Next tCount
         End If
      Next pageNum
      objAVDoc.Close 1
   End If

   getTextFromPDF = strText

End Function

这与您尝试做的事情本质上是一样的 - 仅使用 Adob​​e 自己的库.它一次一页地浏览 PDF,突出显示页面上的所有文本,然后将其(一次一个文本元素)放入一个字符串中.

What this does is essentially the same thing you are trying to do - only using Adobe's own library. It's going through the PDF one page at a time, highlighting all of the text on the page, then dropping it (one text element at a time) into a string.

请记住,您从中得到的内容可能充满各种非打印字符(换行符、换行符等),甚至可能出现在看起来像连续文本块的中间,因此您可能需要额外的代码来清理它才能使用它.

Keep in mind what you get from this could be full of all kinds of non-printing characters (line feeds, newlines, etc) that could even end up in the middle of what look like contiguous blocks of text, so you may need additional code to clean it up before you can use it.

希望有帮助!

这篇关于从 PDF 中提取数据并添加到工作表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆