无头LibreOffice在Windows上导出到PDF的速度非常慢(比Linux慢6倍) [英] Headless LibreOffice very slow to export to PDF on Windows (6 times slow than on Linux)
问题描述
我经常需要使用LibreOffice将许多(> 1000).docx文档导出为PDF.这是一个示例文档: test.docx .以下代码可以运行,但是在Windows上却相当慢(每个PDF文档平均需要3.3秒):
I often need to export many (> 1000) .docx documents to PDF with LibreOffice. Here is a sample document: test.docx. The following code works but it's quite slow on Windows (3.3 seconds on average for each PDF document):
import subprocess, docx, time # first do: pip install python-docx
for i in range(10):
doc = docx.Document('test.docx')
for paragraph in doc.paragraphs:
paragraph.text = paragraph.text.replace('{{num}}', str(i))
doc.save('test%i.docx' % i) # these 4 previous lines are super fast - a few ms
t0 = time.time()
subprocess.call(r'C:\Program Files\LibreOffice\program\soffice.exe --headless --convert-to pdf test%i.docx --outdir . --nocrashreport --nodefault --nofirststartwizard --nolockcheck --nologo --norestore"' % i)
print('PDF generated in %.1f sec' % (time.time()-t0))
# for linux:
# (0.54 seconds on average, so it's 6 times better than on Windows!)
# subprocess.call(['/usr/bin/soffice', '--headless', '--convert-to', 'pdf', '--outdir', '/home/user', 'test%i.docx' % i])
如何在Windows上加快PDF导出速度?
我怀疑在启动LibreOffice/Writer,(执行工作),关闭LibreOffice"上浪费了很多时间.
启动LibreOffice/Writer,(执行工作),关闭LibreOffice"
启动LibreOffice/Writer,(执行此工作),关闭LibreOffice"
等.
注意:
-
作为比较:此处: https://bugs.documentfoundation.org/show_bug.cgi?id = 92274 导出时间据说是90ms或810ms.
As a comparison: here: https://bugs.documentfoundation.org/show_bug.cgi?id=92274 the export time is said to be either 90ms or 810ms.
soffice.exe
替换为 swriter.exe
:相同的问题:平均3.3秒
soffice.exe
replaced by swriter.exe
: same problem: 3.3 second on average
subprocess.call(r'C:\Program Files\LibreOffice\program\swriter.exe --headless --convert-to pdf test%i.docx --outdir ."' % i)
推荐答案
实际上,所有时间都浪费在启动/退出LibreOffice中.相反,我们可以一次调用 soffice.exe
的传递许多docx文档:
Indeed, all the time is wasted in starting/quitting LibreOffice. We can instead pass many docx documents in one call of soffice.exe
:
import subprocess, docx
for i in range(1000):
doc = docx.Document('test.docx')
for paragraph in doc.paragraphs:
paragraph.text = paragraph.text.replace('{{num}}', str(i))
doc.save('test%i.docx' % i)
# all PDFs in one pass:
subprocess.call(['C:\Program Files\LibreOffice\program\swriter.exe',
'--headless', '--convert-to', 'pdf', '--outdir', '.'] + ['test%i.docx' % i for i in range(1000)])
总共107秒,因此每个PDF平均约为107毫秒!
107 seconds total, so it's ~ 107 ms on average per PDF, far better!
注意:
-
它不适用于10,000个文档,因为命令行参数的长度将超过32k个字符,如我想知道是否有可能采用一种更具交互性的方式来与无头LibreOffice一起工作:
I wonder if it's possible to have a more interactive way to work with LibreOffice headless:
- 无头启动Writer,保持启动状态
- 向该过程发送诸如
open test1.docx
之类的动作 - 发送操作
导出为pdf
,然后关闭docx - 发送
打开test2.docx
,然后导出等. - ...
-
退出Writer headless
- start Writer headless, keep it started
- send an action like
open test1.docx
to this process - send action
export to pdf
, and close docx - send
open test2.docx
, then export, etc. - ...
quit Writer headless
这可与带有MS Office的COM(组件对象模型)一起使用: .doc使用python 进行pdf转换,但是我想知道LibreOffice是否存在类似的东西.答案似乎是否定的: LibreOffice/OpenOffice是否支持COM模型
This works with COM (Component Object Model) with MS Office: .doc to pdf using python but I wonder if something similar exists with LibreOffice. The answer seems to be no: Does LibreOffice/OpenOffice Support the COM Model
这篇关于无头LibreOffice在Windows上导出到PDF的速度非常慢(比Linux慢6倍)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!