python -docx 从word docx中提取表格 [英] python -docx to extract table from word docx
问题描述
我知道这是一个重复的问题,但其他答案对我不起作用.我有一个包含一张表的 word 文件.我想要那个表作为我的 python 程序的输出.我正在使用 python 3.6,我也安装了 python -docx.这是我的数据提取代码
I know this is a repeated question but the other answers did not work for me. I have a word file that consists of one table. I want that table as an output of my python program. I'm using python 3.6 and I have installed python -docx as well. Here is my code for the data extraction
from docx.api import Document
document = Document('test_word.docx')
table = document.tables[0]
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
print (data)
我想要的结果与 docx 文件完全一样.提前致谢
I want the result that exactly looks like the word docx file. Thanks in advance
推荐答案
你的代码很适合我.将它插入到数据框中怎么样?
Your code works fine for me. How about inserting it into a dataframe?
import pandas as pd
from docx.api import Document
document = Document('test_word.docx')
table = document.tables[0]
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
print (data)
df = pd.DataFrame(data)
如何显示该表中的特定行和列?我们可以使用 iloc 根据索引提取行和列
How can i display particular row and column in that table? We can extract rows and cols based on index with iloc
# iloc[row,columns]
df.iloc[0,:].tolist() # [5,6,7,8] - row index 0
df.iloc[:,0].tolist() # [5,9,13,17] - column index 0
df.iloc[0,0] # 5 - cell(0,0)
df.iloc[1:,2].tolist() # [11,15,19] - column index 2, but skip first row
等等……
但是,如果您的列有名称(在本例中是数字),您可以这样做:
However, if your columns have names (in this case it is numbers) you can do it like this:
#df["name"].tolist()
df[1].tolist() # [5,6,7,8] - column with name 1
<小时>
print(df)
打印,这是我的示例文档中表格的样子.
prints, which is how the table looks like in my sample doc.
1 2 3 4
0 5 6 7 8
1 9 10 11 12
2 13 14 15 16
3 17 18 19 20
这篇关于python -docx 从word docx中提取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!