python-docx 在它们应该已满时返回空单元格 [英] python-docx returning empty cells when they should be full

查看:63
本文介绍了python-docx 在它们应该已满时返回空单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试遍历文档中的所有表格并从中提取文本.作为中间步骤,我只是想将文本打印到控制台.

I am trying to iterate through all tables in a document and extract the text from them. As an intermediate step I am just trying to print the text to the console.

我在类似的帖子中查看了 scanny 提供的其他代码,但由于某种原因,它没有从我正在解析的文档中给我预期的输出

I have looked at other code provided by scanny in similar posts but for some reason it is not giving me my expected output from the document I am parsing through

该文件可在 https://www.ontario.ca/laws/regulation 找到/140300

from docx import Document
from docx.enum.text import WD_COLOR_INDEX
import os, re, sys

document = Document("path/to/doc")

tables = document.tables

for table in tables:

    for row in table.rows:

         for cell in row.cells:

              for paragraph in cell.paragraphs:
                   print(paragraph.text)

我希望这能打印出所有文本,但我什么也没得到.如果我尝试打印(row.cells),它只会打印().我猜这是一个空列表.不过,我的文档确实在单元格中包含文本.不知道这里出了什么问题.

I expect this to print out all the text but instead I get nothing. if I try to print(row.cells) it just prints (). which is an empty list I guess. My document definetly does have text in the cells though. Not sure whats wrong here.

感谢任何帮助,

推荐答案

发现错误.我使用第三方工具(multiDoc 转换器)将旧的 .Doc 文件转换为 Docx 格式.大多数情况下都有效,但是必须有一些元数据无法正确转换,因为它导致了问题.打开文件并将其手动保存为 Docx 解决了该问题.唯一的问题是我想将 2000 多个文件转换为 Docx,所以我需要找到另一个解决方案来转换文件.

Found the error. I was using a third party tool (multiDoc converter) to convert old .Doc files into Docx format. works for the most part, however there must be some meta data that doesn't convert properly because it was causing the issue. Opening the file and manually saving it as Docx solved the issue. Only problem is that I want to convert 2000+ files into Docx, so I'll need to find another solution for convertiing the files.

这篇关于python-docx 在它们应该已满时返回空单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆