VBA - 从PDF提取数据并添加到工作表 [英] VBA - Extract Data from PDF and Add to Worksheet

查看:2379
本文介绍了VBA - 从PDF提取数据并添加到工作表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个项目,我试图将数据从PDF文档提取到工作表中。 PDF已经显示和文本,可以手动复制并粘贴到Excel文档中。



我正在通过SendKeys来完成这个项目,并没有真正的工作太好当我尝试从PDF文档中粘贴任何我使用的方法的数据时,我收到一个错误!有谁知道一个漂亮的做事方式吗?这将是一个很大的帮助!而且,为什么我的贴不工作?如果我在宏已经停止运行之后粘贴正常粘贴?代码如下:

  Dim myPath As String,myExt As String 
Dim ws As Worksheet
Dim openPDF As Object
'Dim pasteData As MSForms.DataObject
Dim fCell As Range

'Set pasteData = New MSForms.DataObject
设置ws =表格(DATA)
如果ws.Cells(ws.Rows.Count,A)。End(xlUp).Row> 1 Then Range(A3:A& ws.Cells(ws.Rows.Count,A)。End(xlUp).Row).ClearContents

myExt =\ *。 pdf
当扫描收据按钮按下时扫描所选文件夹的收据
对于每个fCell范围(ws.Cells(1,1),ws.Cells(1,ws.Cells(1 ,ws.Columns.Count).End(xlToLeft).Column))
myPath = Dir(fCell.Value& myExt)
尽管myPath<>
myPath = fCell.Value& \& myPath
设置openPDF = CreateObject(Shell.Application)
openPDF.Open(myPath)
Application.Wait Now + TimeValue(00:00:2)
SendKeys ^ a
Application.Wait Now + TimeValue(00:00:2)
SendKeys^ c
'Application.Wait Now + TimeValue(00:00:2 )
ws.Select
ActiveSheet.Paste
'pasteData.GetFromClipboard

'ws.Cells(3,1)= pasteData.GetText
退出Sub

myPath = Dir
循环

下一步fCell


解决方案

您可以使用Adobe库打开PDF文件并提取其内容(我相信可以从Adobe下载,作为SDK的一部分, Acrobat的版本以及)



确保将库添加到引用中(在我的机器上,它是Adobe Acrobat 10.0类型库,但不知道是否是最新的版本)



即使使用Adobe库也不是微不足道的(您需要添加自己的错误捕获等):

 函数getTextFromPDF(ByVal strFilename As String)As String 
Dim objAVDoc As New AcroAVDoc
Dim objPDDoc As New AcroPDDoc
Dim objPage As AcroPDPage
Dim objSelection As AcroPDTextSelect
Dim objHighlight As AcroHiliteList
Dim pageNum As Long
Dim strText As String

strText =
如果(objAvDoc.Open(strFilename,)然后
设置objPDDoc = objAVDoc.GetPDDoc
对于pageNum = 0 To objPDDoc.GetNumPages() - 1
设置objPage = objPDDoc.AcquirePage(pageNum)
设置objHighlight =新的AcroHiliteList
objHighlight.Add 0,10000'如果没有获取页面上的所有文本,调整这个
设置objSelection = objPage.CreatePageHilite(objHighlight)

如果不是objSelection是没有,然后
对于tCount = 0 To objSelection.GetNumText - 1
strText = strText& objSelection.GetText(tCount)
下一个tCount
结束如果
下一页
objAVDoc.Close 1
结束如果

getTextFromPDF = strText

结束功能

这样做与您尝试的基本上是一样的做 - 只使用Adobe自己的图书馆。一次通过PDF一个页面,突出显示页面上的所有文本,然后将其(一次一个文本元素)放入字符串。



请记住,从这可以得到什么可能充满各种非打印字符(换行符,换行符等),甚至可能会在连续的文本块的中间,因此您可能需要其他代码要清理它,然后才能使用它。



希望有帮助!


I have a project where I am trying to extract the Data from a PDF document into a worksheet. The PDF's already show and text and can be manually copied and pasted into the Excel Document.

I am currently doing this project through SendKeys and it is not really working too well as I get an error when I try to paste the data from the PDF document whatever method I use! Does anyone know a prettier way of doing things? It would be a great help! And also, why is my paste not working?! If I paste after the macro has stopped running it pastes as normal? Code Below:

Dim myPath As String, myExt As String
Dim ws As Worksheet
Dim openPDF As Object
'Dim pasteData As MSForms.DataObject
Dim fCell As Range

'Set pasteData = New MSForms.DataObject
Set ws = Sheets("DATA")
If ws.Cells(ws.Rows.Count, "A").End(xlUp).Row > 1 Then Range("A3:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row).ClearContents

myExt = "\*.pdf"
'When Scan Receipts Button Pressed Scan the selected folder/s for receipts
For Each fCell In Range(ws.Cells(1, 1), ws.Cells(1, ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column))
    myPath = Dir(fCell.Value & myExt)
    Do While myPath <> ""
        myPath = fCell.Value & "\" & myPath
        Set openPDF = CreateObject("Shell.Application")
        openPDF.Open (myPath)
        Application.Wait Now + TimeValue("00:00:2")
        SendKeys "^a"
        Application.Wait Now + TimeValue("00:00:2")
        SendKeys "^c"
        'Application.Wait Now + TimeValue("00:00:2")
        ws.Select
        ActiveSheet.Paste
        'pasteData.GetFromClipboard

        'ws.Cells(3, 1) = pasteData.GetText
        Exit Sub

        myPath = Dir
    Loop

Next fCell

解决方案

You can open the PDF file and extract its contents using the Adobe library (which I believe you can download from Adobe as part of the SDK, but it comes with certain versions of Acrobat as well)

Make sure to add the Library to your references too (On my machine it is the Adobe Acrobat 10.0 Type Library, but not sure if that is the newest version)

Even with the Adobe library it is not trivial (you'll need to add your own error-trapping etc):

Function getTextFromPDF(ByVal strFilename As String) As String
   Dim objAVDoc As New AcroAVDoc
   Dim objPDDoc As New AcroPDDoc
   Dim objPage As AcroPDPage
   Dim objSelection As AcroPDTextSelect
   Dim objHighlight As AcroHiliteList
   Dim pageNum As Long
   Dim strText As String

   strText = ""
   If (objAvDoc.Open(strFilename, "") Then
      Set objPDDoc = objAVDoc.GetPDDoc
      For pageNum = 0 To objPDDoc.GetNumPages() - 1
         Set objPage = objPDDoc.AcquirePage(pageNum)
         Set objHighlight = New AcroHiliteList
         objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
         Set objSelection = objPage.CreatePageHilite(objHighlight)

         If Not objSelection Is Nothing Then
            For tCount = 0 To objSelection.GetNumText - 1
               strText = strText & objSelection.GetText(tCount)
            Next tCount
         End If
      Next pageNum
      objAVDoc.Close 1
   End If

   getTextFromPDF = strText

End Function

What this does is essentially the same thing you are trying to do - only using Adobe's own library. It's going through the PDF one page at a time, highlighting all of the text on the page, then dropping it (one text element at a time) into a string.

Keep in mind what you get from this could be full of all kinds of non-printing characters (line feeds, newlines, etc) that could even end up in the middle of what look like contiguous blocks of text, so you may need additional code to clean it up before you can use it.

Hope that helps!

这篇关于VBA - 从PDF提取数据并添加到工作表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆