循环浏览PDF文件,并使用Word将其转换为doc [英] Loop over PDF files and transform them into doc with word

查看:92
本文介绍了循环浏览PDF文件,并使用Word将其转换为doc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用VBA编码(这是我的新手)从PDF(不是图像)中获取一系列.doc文档,也就是说,我正在尝试遍历各种PDF文件并保存它们以MS Word格式.我的经验是,word可以很好地读取我拥有的PDF文档:word大部分时间都保持PDF文件的正确布局.我不确定这是否是解决此问题的正确选择,我希望提出其他建议-如果可能,请使用R.

I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various PDF files and save them in MS Word format. My experience is that word reads pretty well the PDF documents that I have: word maintains the correct layout of the PDF file most of the time. I am not sure if this is the right choice to tackle this and I ask for an alternative suggestion -- using R, if possible.

无论如何,这是我发现的代码

Anyway, here it is the code which I found here:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf") 'pdf path

   Do While (file <> "")

   ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

          Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

    ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

    ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

将其粘贴到开发人员的窗口后,将代码保存在模块中->关闭开发人员的窗口->单击宏"按钮->执行"convertToWord"宏.在弹出框中出现以下错误:未定义子或函数".我该如何解决?同样,以前,由于某种原因(现在我还不清楚),我遇到了与函数ChangeFileOpenDirectory有关的错误,该错误似乎也未定义.

After pasting it in the developer's window, I save the code in a module -> I close the developer's window -> I click on the "Macros" button -> I execute the "convertToWord" macro. I get the following error in a pop up box: "Sub or Function not defined". How do I fix this? Also, previously, for some reason that is not clear to me now, I got an error related to the function ChangeFileOpenDirectory, which seemed not to be defined also.

更新27/08/2017

我将代码更改为以下内容:

I changed the code to the following:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf")

   ChDir "C:\Users\username\work_dir_example"

   Do While (file <> "")

        Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

        ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

现在我在弹出框中没有收到任何错误消息,但是我的工作目录中没有输出.现在可能有什么问题?

Now I do not get any error messages in a pop up box, but there is no output in my working directory. What might be wrong with it right now?

推荐答案

任何可以读取PDF文件和编写Word文档(XML)的语言都可以做到这一点,但是您喜欢的转换(当PDF是PDF时,Word可以做到这一点)打开)将需要为应用程序本身使用API​​. VBA是您轻松的选择.

Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.

您发布的摘录(以及下面的示例)使用早期绑定和枚举常量,这意味着我们需要对Word对象库的引用.已经为您在Word文档中编写的任何代码进行了设置,因此请创建一个新的Word文档,并将该代码添加到标准模块中. (如果需要更多详细信息,请参见此 Excel教程,我们的流程步骤如下:一样).

The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).

您可以从VB编辑器(使用运行"按钮)或普通文档窗口(在Word 2010-2016中的视图"选项卡上单击宏"按钮)运行宏.如果您想重复使用宏而无需再次设置代码,则将文档另存为DOCM文件.

You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.

现在输入代码!

如注释中所述,如果仅确保文件夹路径以反斜杠"\"字符结尾,则第二个代码段有效.修复该问题后,它仍然不是很好的代码,但是它可以使您正常运行.

As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.

我假设您想加倍努力,并有一个写得很好的版本,您可以在以后重新使用或扩展.为简单起见,我们将使用两个过程:主转换和一个抑制PDF转换警告对话框的过程(由注册表控制).

I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).

主要步骤:

Sub ConvertPDFsToWord2()
    Dim path As String
    'Manually edit path in the next line before running
    path = "C:\users\username\work_dir_example\"

    Dim file As String
    Dim doc As Word.Document
    Dim regValPDF As Integer
    Dim originalAlertLevel As WdAlertLevel

'Generate string for getting all PDFs with Dir command
    'Check for terminal \
    If Right(path, 1) <> "\" Then path = path & "\"
    'Append file type with wildcard
    file = path & "*.pdf"

    'Get path for first PDF (blank string if no PDFs exist)
    file = Dir(file)

    originalAlertLevel = Application.DisplayAlerts
    Application.DisplayAlerts = wdAlertsNone

    If file <> "" Then regValPDF = TogglePDFWarning(1)

    Do While file <> ""
        'Open method will automatically convert PDF for editing
        Set doc = Documents.Open(path & file, False)

        'Save and close document
        doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
                    fileformat:=wdFormatDocumentDefault
        doc.Close False

        'Get path for next PDF (blank string if no PDFs remain)
        file = Dir
    Loop

CleanUp:
    On Error Resume Next 'Ignore errors during cleanup
    doc.Close False
    'Restore registry value, if necessary
    If regValPDF <> 1 Then TogglePDFWarning regValPDF
    Application.DisplayAlerts = originalAlertLevel

End Sub

注册表设置功能:

Private Function TogglePDFWarning(newVal As Integer) As Integer
'This function reads and writes the registry value that controls
'the dialog displayed when Word opens (and converts) a PDF file
    Dim wShell As Object
    Dim regKey As String
    Dim regVal As Variant

    'setup shell object and string for key
    Set wShell = CreateObject("WScript.Shell")
    regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
             Application.Version & "\Word\Options\"

    'Get existing registry value, if any
    On Error Resume Next 'Ignore error if reg value does not exist
    regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
    On Error GoTo 0      'Break on errors after this point

    wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"

    'Return original setting / registry value (0 if omitted)
    If Err.Number <> 0 Or regVal = 0 Then
        TogglePDFWarning = 0
    Else
        TogglePDFWarning = 1
    End If

End Function

这篇关于循环浏览PDF文件,并使用Word将其转换为doc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆