Python - 将表格从 .doc/.docx 文件转换为 .xls [英] Python - Convert tables from .doc / .docx-files to .xls
问题描述
我的任务是将一系列表格从 .doc
和 .docx-files
转换为 .xls
,
I'm tasked to convert a series of tables from .doc
and .docx-files
to .xls
,
但是还没有设法找到一种有效的方法来做到这一点.表格可能位于其他文本之间.
But have not managed to find an efficient way to do this. The tables may be in between other text.
我已经研究了 pywin32、xlwt
和其他几个库,但似乎我必须经过很多步骤.
I have looked into pywin32, xlwt
and a couple of other libraries, but it seems like I have to go through a lot of steps.
这个表格从 *.doc/*.docx
到 *.xls
文件的转换有什么提示吗?
Any tips for this table conversion from *.doc/*.docx
to *.xls
file?
推荐答案
我假设您有太多的文档需要复制/粘贴,并寻求一个实用的解决方案供内部使用.此解决方案:
I'm assuming you have too many documents for copy/paste, and seek a pragmatic solution for internal use. This solution:
- 以批处理模式在 Word 中打开文件
- 您可以编写一个小脚本来从 HTML 中删除标签之外的所有内容
- 以 HTML 格式保存文件,但使用 .xls 扩展名
- HTML 文件默认会在 Excel 中打开,您只需点击警告即可.
在 Word 中创建一个宏,例如:
Create a macro in Word such as this:
Sub BatchSaveAs()
' Set output_dir appropriately
ChangeFileOpenDirectory "output_dir"
outDocName = Left(ActiveDocument.Name, Len(ActiveDocument.Name) - 4) & ".xls"
ActiveDocument.SaveAs FileName:=outDocName, FileFormat:= _
wdFormatFilteredHTML, LockComments:=False, Password:="", AddToRecentFiles _
:=True, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts _
:=False, SaveNativePictureFormat:=False, SaveFormsData:=False, _
SaveAsAOCELetter:=False
ActiveWindow.View.Type = wdWebView
Application.Quit SaveChanges:=wdDoNotSaveChanges
End Sub
现在您可以通过为每个输入文件调用它的脚本以批处理模式运行 Word:
Now you can run Word in batch mode through a script which calls it for each input file:
winword file_name /mBatchSaveAs
(您可能需要使用完整路径名)
(You may need to use full path names)
如果打开 HTML/Excel 文件的警告不正确,您可以编写一个小 Python 脚本以批处理模式运行 Excel.这显示了如何从 Python 中运行 Excel:
If the warning on opening the HTML / Excel files is not OK, you could write a little Python script to run Excel in batch mode. This shows how to run Excel in from Python:
我发现一些有用的技巧:使用 finally 进行清理;您需要的代码看起来像 VBA 代码,如果您不擅长 VBA,请录制一个宏来做您想做的事情并针对 Python 语法进行修改.
Some tricks I found useful: use finally for your clean-up; the code you need looks like VBA code, and if you're not good at VBA, record a macro to do what you want and modify for Python syntax.
这篇关于Python - 将表格从 .doc/.docx 文件转换为 .xls的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!