使用Python和LibreOffice将pdf转换为docx并将doc转换为docx时遇到问题 [英] Having trouble using Python and LibreOffice to convert pdf to docx and doc to docx

查看:1651
本文介绍了使用Python和LibreOffice将pdf转换为docx并将doc转换为docx时遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了很多时间试图确定到底出了什么问题,我使用的代码是使用LibreOffice将pdf转换为docx(以及将doc转换为docx)的.

I have spent a good amount of time trying to determine what is going wrong exactly, with the code I am using to convert pdf to docx (and doc to docx) using LibreOffice.

我已经使用了Windows运行界面来测试运行一些我发现相关的代码,并且还尝试了python,但都不起作用.

I have used both the windows run interface to test-run some of the code I have found to be relevant, and have tried on python as well, neither of which works.

我在Windows上安装了LibreOffice v6.0.2. 我一直在使用此代码的变体来尝试将某些pdf转换为与特定pdf文件并不真正相关的docx:

I have LibreOffice v6.0.2 installed on windows. I have been using variations of this code to attempt to convert some pdfs to docx of which the specific pdf file is not really relevant:

    import subprocess
    lowriter='C://Program Files/LibreOffice/program/swriter.exe'
    subprocess.run('{} --invisible --convert-to docx --outdir "{}" "{}"'
                   .format(lowriter,'dir',

    'filepath.pdf',),shell=True)

我再次尝试了代码,无论是在Windows操作系统的运行界面中,还是使用上述代码通过python进行的尝试,都没有运气.我也尝试了没有outdir的情况,以防万一我写错了,但是总是得到返回代码1:

I hvae tried code, again, in both the run interface on the windows os, and through python using the above code, with no luck. I have tried without the outdir as well, just in case I was writing that incorrectly, but always get a return code of 1:

    CompletedProcess(args='C://Program Files/LibreOffice/program/swriter.exe 
    --invisible --convert-to docx --outdir "{dir}" 
    {filepath.pdf}"', returncode=1)

dir和filepath.pdf是我放置的占位符.

The dir and filepath.pdf are place holders I have put.

我的文档到docx转换也有类似的问题.

I have a similar problem with the doc to docx conversion.

推荐答案

此处存在许多问题.您首先应该按照@CristiFati的说明从命令行获得--convert-to调用,然后在python中实现.

There are a number of problems here. You should first get the --convert-to call to work from the command line as @CristiFati commented, and then implement in python.

这是在我的系统上运行的代码.路径中没有//,并且需要使用引号.另外,该文件夹在我的系统上为LibreOffice 5.

Here is the code that works on my system. No // in the path, and quotes are needed. Also, the folder is LibreOffice 5 on my system.

import subprocess
lowriter = 'C:/Program Files (x86)/LibreOffice 5/program/swriter.exe'
subprocess.run(
    '"{}" --convert-to docx --outdir "{}" "{}"'
    .format(lowriter,'dir', 'filepath.doc',), shell=True)

最后,似乎不支持从PDF转换为DOCX. LibreOffice Draw可以打开PDF文件并另存为ODG格式.

Finally, it looks like converting from PDF to DOCX is not supported. LibreOffice Draw can open a PDF file and save as ODG format.

编辑:

这是从PDF转换的工作代码.我已升级到LO 6,因此路径中不再需要版本号("LibreOffice 5").

Here is working code to convert from PDF. I upgraded to LO 6, so the version number ("LibreOffice 5") is no longer required in the path.

import subprocess
loffice = 'C:/Program Files/LibreOffice/program/soffice.exe'
subprocess.run(
    '"{}" --convert-to odg --outdir "{}" "{}"'
    .format(loffice,'dir', 'filepath.pdf',), shell=True)

这篇关于使用Python和LibreOffice将pdf转换为docx并将doc转换为docx时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆