从 Powerpoint 中提取表格 [英] Extract table from Powerpoint

查看:90
本文介绍了从 Powerpoint 中提取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 python-pptx 从 PPT 中提取表格,但是,我不确定如何使用 shape.table.

from pptx import Presentationprs = 演示文稿(路径到演示文稿)# text_runs 将填充一个字符串列表,# 每个文本在演示文稿中运行一个text_runs = []对于 prs.slides 中的幻灯片:对于 slide.shapes 中的形状:如果 shape.has_table:tbl = shape.table行 = tbl.rows.countcols = tbl.columns.count

我在

解决方案

这似乎对我有用.

<预><代码>prs = 演示文稿((path_to_presentation))# text_runs 将填充一个字符串列表,# 每个文本在演示文稿中运行一个text_runs = []对于 prs.slides 中的幻灯片:对于 slide.shapes 中的形状:如果不是 shape.has_table:继续tbl = shape.tablerow_count = len(tbl.rows)col_count = len(tbl.columns)对于范围内的 r(0, row_count):对于范围内的 c (0, col_count):单元格 = tbl.cell(r,c)段落 = cell.text_frame.paragraphs对于段落中的段落:用于在paragraph.runs 中运行:text_runs.append(run.text)打印(text_runs)```

I am trying to extract table from a PPT using python-pptx, however, the I am not sure how do I that using shape.table.

from pptx import Presentation
prs = Presentation(path_to_presentation)
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
  for shape in slide.shapes:
    if shape.has_table:
      tbl = shape.table
      rows = tbl.rows.count
      cols = tbl.columns.count

I found a post here but the accepted solution does not work, giving error that count attribute is not available.

How do I modify the above code so I can get a table in a dataframe?

EDIT

Please see the image of the slide below

解决方案

This appears to work for me.


prs = Presentation((path_to_presentation))
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
    for shape in slide.shapes:
        if not shape.has_table:
            continue    
        tbl = shape.table
        row_count = len(tbl.rows)
        col_count = len(tbl.columns)
        for r in range(0, row_count):
            for c in range(0, col_count):
                cell = tbl.cell(r,c)
                paragraphs = cell.text_frame.paragraphs 
                for paragraph in paragraphs:
                    for run in paragraph.runs:
                        text_runs.append(run.text)

print(text_runs)```





这篇关于从 Powerpoint 中提取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆