使用Python将PDF转换为.docx [英] Convert PDF to .docx with Python

查看：960 发布时间：2020/4/30 10:41:59 python pdf docx libreoffice doc

本文介绍了使用Python将PDF转换为.docx的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在竭尽全力找到使用Python将 PDF文件转换为.docx文件的方法.

I'm trying very hard to find the way to convert a PDF file to a .docx file with Python.

我看到了与此相关的其他帖子，但就我而言，它们似乎都无法正常工作.

I have seen other posts related with this, but none of them seem to work correctly in my case.

我正在专门使用

import os
import subprocess

for top, dirs, files in os.walk('/my/pdf/folder'):
    for filename in files:
        if filename.endswith('.pdf'):
            abspath = os.path.join(top, filename)
            subprocess.call('lowriter --invisible --convert-to doc "{}"'
                            .format(abspath), shell=True)

这给了我Output [1]，但是，在我的文件夹中找不到任何.docx文档.

This gives me Output[1], but then, I can't find any .docx document in my folder.

我已经安装了LibreOffice 5.3.

I have LibreOffice 5.3 installed.

有任何线索吗?

提前谢谢！

推荐答案

我不知道使用libreoffice将pdf文件转换为Word文件的方法.
但是，可以将pdf转换为html，然后将html转换为docx.
首先，使命令在命令行上运行. (以下是在Linux上.因此，您可能必须填写Soffice二进制文件的路径名，并在操作系统上为输入文件使用完整路径)

I am not aware of a way to convert a pdf file into a Word file using libreoffice.
However, you can convert from a pdf to a html and then convert the html to a docx.
Firstly, get the commands running on the command line. (The following is on Linux. So you may have to fill in path names to the soffice binary and use a full path for the input file on your OS)

soffice --convert-to html ./my_pdf_file.pdf

然后

soffice --convert-to docx:'MS Word 2007 XML' ./my_pdf_file.html

您应该最终得到:

my_pdf_file.pdf
my_pdf_file.html
my_pdf_file.docx

现在将命令包装在subprocess代码中

Now wrap the commands in your subprocess code

这篇关于使用Python将PDF转换为.docx的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python将PDF转换为.docx [英] Convert PDF to .docx with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python将PDF转换为.docx [英] Convert PDF to .docx with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭