使用Python将PDF转换为.docx [英] Convert PDF to .docx with Python
问题描述
我正在竭尽全力找到使用Python将 PDF文件转换为.docx文件的方法.
I'm trying very hard to find the way to convert a PDF file to a .docx file with Python.
我看到了与此相关的其他帖子,但就我而言,它们似乎都无法正常工作.
I have seen other posts related with this, but none of them seem to work correctly in my case.
我正在专门使用
import os
import subprocess
for top, dirs, files in os.walk('/my/pdf/folder'):
for filename in files:
if filename.endswith('.pdf'):
abspath = os.path.join(top, filename)
subprocess.call('lowriter --invisible --convert-to doc "{}"'
.format(abspath), shell=True)
这给了我Output [1],但是,在我的文件夹中找不到任何.docx文档.
This gives me Output[1], but then, I can't find any .docx document in my folder.
我已经安装了LibreOffice 5.3.
I have LibreOffice 5.3 installed.
有任何线索吗?
提前谢谢!
推荐答案
我不知道使用libreoffice将pdf
文件转换为Word
文件的方法.
但是,可以将pdf
转换为html
,然后将html
转换为docx
.
首先,使命令在命令行上运行. (以下是在Linux上.因此,您可能必须填写Soffice二进制文件的路径名,并在操作系统上为输入文件使用完整路径)
I am not aware of a way to convert a pdf
file into a Word
file using libreoffice.
However, you can convert from a pdf
to a html
and then convert the html
to a docx
.
Firstly, get the commands running on the command line. (The following is on Linux. So you may have to fill in path names to the soffice binary and use a full path for the input file on your OS)
soffice --convert-to html ./my_pdf_file.pdf
然后
soffice --convert-to docx:'MS Word 2007 XML' ./my_pdf_file.html
您应该最终得到:
my_pdf_file.pdf
my_pdf_file.html
my_pdf_file.docx
my_pdf_file.pdf
my_pdf_file.html
my_pdf_file.docx
现在将命令包装在subprocess
代码中
Now wrap the commands in your subprocess
code
这篇关于使用Python将PDF转换为.docx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!