python-docx:将表解析为Panda Dataframe [英] python-docx: Parse a table to Panda Dataframe

查看:595
本文介绍了python-docx:将表解析为Panda Dataframe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 python-docx 库提取ms word文档。我可以使用相同的库从word文档中获取所有表格。但是,我想将表格解析为熊猫数据框,是否可以使用任何内置功能将表格解析为数据框,还是必须手动进行?
另外,是否有可能知道表所在的标题名称?谢谢

I'm using the python-docx library to extract ms word document. I'm able to get all the tables from the word document by using the same library. However, I'd like to parse the table into a panda data frame, is there any built-in functionality I can use to parse the table into data frame or I'll have to do it manually? Also, is there a possibility to know the heading name in which the table lies inside? Thank you

from docx import Document
from docx.shared import Inches
document = Document('test.docx')

tabs = document.tables


推荐答案

您可以使用以下代码从数据框中的文档中提取表:

You can extract tables from the document in data-frame by using this code :

from docx import Document
import pandas as pd
document = Document('test.docx')

tables = []
for table in document.tables:
    df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            if cell.text:
                df[i][j] = cell.text
    tables.append(pd.DataFrame(df))
print(tables)

您可以从表变量中获取所有表。

You can get all the tables from the tables variable.

这篇关于python-docx:将表解析为Panda Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆