循环浏览PDF文件,并使用Word将其转换为doc [英] Loop over PDF files and transform them into doc with word
问题描述
我正在尝试使用VBA编码(这是我的新手)从PDF(不是图像)中获取一系列.doc文档,也就是说,我正在尝试遍历各种PDF文件并保存它们以MS Word格式.我的经验是,word可以很好地读取我拥有的PDF文档:word大部分时间都保持PDF文件的正确布局.我不确定这是否是解决此问题的正确选择,我希望提出其他建议-如果可能,请使用R.
I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various PDF files and save them in MS Word format. My experience is that word reads pretty well the PDF documents that I have: word maintains the correct layout of the PDF file most of the time. I am not sure if this is the right choice to tackle this and I ask for an alternative suggestion -- using R, if possible.
Anyway, here it is the code which I found here:
Sub convertToWord()
Dim MyObj As Object, MySource As Object, file As Variant
file = Dir("C:\Users\username\work_dir_example" & "*.pdf") 'pdf path
Do While (file <> "")
ChangeFileOpenDirectory "C:\Users\username\work_dir_example"
Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
"", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
Format:=wdOpenFormatAuto, XMLTransform:=""
ChangeFileOpenDirectory "C:\Users\username\work_dir_example"
ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
, LockComments:=False, Password:="", AddToRecentFiles:=True, _
WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False, CompatibilityMode:=15
ActiveDocument.Close
file = Dir
Loop
End Sub
将其粘贴到开发人员的窗口后,将代码保存在模块中->关闭开发人员的窗口->单击宏"按钮->执行"convertToWord"宏.在弹出框中出现以下错误:未定义子或函数".我该如何解决?同样,以前,由于某种原因(现在我还不清楚),我遇到了与函数ChangeFileOpenDirectory
有关的错误,该错误似乎也未定义.
After pasting it in the developer's window, I save the code in a module -> I close the developer's window -> I click on the "Macros" button -> I execute the "convertToWord" macro. I get the following error in a pop up box: "Sub or Function not defined". How do I fix this? Also, previously, for some reason that is not clear to me now, I got an error related to the function ChangeFileOpenDirectory
, which seemed not to be defined also.
更新27/08/2017
我将代码更改为以下内容:
I changed the code to the following:
Sub convertToWord()
Dim MyObj As Object, MySource As Object, file As Variant
file = Dir("C:\Users\username\work_dir_example" & "*.pdf")
ChDir "C:\Users\username\work_dir_example"
Do While (file <> "")
Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
"", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
Format:=wdOpenFormatAuto, XMLTransform:=""
ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
, LockComments:=False, Password:="", AddToRecentFiles:=True, _
WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False, CompatibilityMode:=15
ActiveDocument.Close
file = Dir
Loop
End Sub
现在我在弹出框中没有收到任何错误消息,但是我的工作目录中没有输出.现在可能有什么问题?
Now I do not get any error messages in a pop up box, but there is no output in my working directory. What might be wrong with it right now?
推荐答案
任何可以读取PDF文件和编写Word文档(XML)的语言都可以做到这一点,但是您喜欢的转换(当PDF是PDF时,Word可以做到这一点)打开)将需要为应用程序本身使用API. VBA是您轻松的选择.
Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.
您发布的摘录(以及下面的示例)使用早期绑定和枚举常量,这意味着我们需要对Word对象库的引用.已经为您在Word文档中编写的任何代码进行了设置,因此请创建一个新的Word文档,并将该代码添加到标准模块中. (如果需要更多详细信息,请参见此 Excel教程,我们的流程步骤如下:一样).
The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).
您可以从VB编辑器(使用运行"按钮)或普通文档窗口(在Word 2010-2016中的视图"选项卡上单击宏"按钮)运行宏.如果您想重复使用宏而无需再次设置代码,则将文档另存为DOCM文件.
You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.
现在输入代码!
如注释中所述,如果仅确保文件夹路径以反斜杠"\"字符结尾,则第二个代码段有效.修复该问题后,它仍然不是很好的代码,但是它可以使您正常运行.
As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.
我假设您想加倍努力,并有一个写得很好的版本,您可以在以后重新使用或扩展.为简单起见,我们将使用两个过程:主转换和一个抑制PDF转换警告对话框的过程(由注册表控制).
I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).
主要步骤:
Sub ConvertPDFsToWord2()
Dim path As String
'Manually edit path in the next line before running
path = "C:\users\username\work_dir_example\"
Dim file As String
Dim doc As Word.Document
Dim regValPDF As Integer
Dim originalAlertLevel As WdAlertLevel
'Generate string for getting all PDFs with Dir command
'Check for terminal \
If Right(path, 1) <> "\" Then path = path & "\"
'Append file type with wildcard
file = path & "*.pdf"
'Get path for first PDF (blank string if no PDFs exist)
file = Dir(file)
originalAlertLevel = Application.DisplayAlerts
Application.DisplayAlerts = wdAlertsNone
If file <> "" Then regValPDF = TogglePDFWarning(1)
Do While file <> ""
'Open method will automatically convert PDF for editing
Set doc = Documents.Open(path & file, False)
'Save and close document
doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
fileformat:=wdFormatDocumentDefault
doc.Close False
'Get path for next PDF (blank string if no PDFs remain)
file = Dir
Loop
CleanUp:
On Error Resume Next 'Ignore errors during cleanup
doc.Close False
'Restore registry value, if necessary
If regValPDF <> 1 Then TogglePDFWarning regValPDF
Application.DisplayAlerts = originalAlertLevel
End Sub
注册表设置功能:
Private Function TogglePDFWarning(newVal As Integer) As Integer
'This function reads and writes the registry value that controls
'the dialog displayed when Word opens (and converts) a PDF file
Dim wShell As Object
Dim regKey As String
Dim regVal As Variant
'setup shell object and string for key
Set wShell = CreateObject("WScript.Shell")
regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
Application.Version & "\Word\Options\"
'Get existing registry value, if any
On Error Resume Next 'Ignore error if reg value does not exist
regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
On Error GoTo 0 'Break on errors after this point
wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"
'Return original setting / registry value (0 if omitted)
If Err.Number <> 0 Or regVal = 0 Then
TogglePDFWarning = 0
Else
TogglePDFWarning = 1
End If
End Function
这篇关于循环浏览PDF文件,并使用Word将其转换为doc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!