pandas 可以阅读和修改单个Excel文件工作表(选项卡),而不修改文件的其余部分? [英] Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?

查看:356
本文介绍了 pandas 可以阅读和修改单个Excel文件工作表(选项卡),而不修改文件的其余部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

许多电子表格都有公式和格式,用于读取和写入Excel文件的Python工具无法忠实地复制。这意味着我想以编程方式创建的任何文件必须是我从头开始创建的,然后其他Excel文件(具有上述复杂性)必须引用该文件(这会产生各种其他依赖关系问题)。



我对Excel文件选项卡的理解是,它们实际上只是一个XML文件的集合。那么可以使用大熊猫(或其中一个底层的读/写引擎如xlsxwriter或者openpyxl来修改其中一个选项卡,留下其他选项卡(其中有更多的恶作剧)完好无损?



编辑:我将尝试用一个例子来进一步阐述问题。




  • Excel Sheet测试。 xlsx有四个选项卡(又名工作表):Sheet1,Sheet2,Sheet3,Sheet4

  • 我使用pandas.read_excel()
  • $将Sheet3读入DataFrame(我们称之为df) b $ b
  • Sheet1和Sheet2包含公式,图形以及openpyxl和xlrd都可以成功解析的各种格式,Sheet4包含其他数据,我根本不想触摸这些标签。

  • Sheet2实际上对Sheet3上的单元格有一些引用

  • 我对df进行了一些编辑,现在想将其写回到sheet3,使其他表格保持不变(和参考文献可以从工作簿中的其他工作表中完整)



我可以做吗那么和如果是的话,怎么样?

解决方案

我有一个类似的问题,关于excel和python之间的交互(特别是熊猫),我被提到这个问题。



感谢stackoverflow社区的一些指针,我发现一个名为 xlwings ,似乎涵盖了HaPsantran所需的许多功能。



要使用OP的示例:



使用现有的excel文件,可以在要导入的数据块(Sheet3)中放置一个锚点通过在excel中命名它,并做:

 #打开一个现有的excel文件

wb =工作簿(Existing_file)



在excel文件中查找一个命名单元格,并到达单元格块的边界(由空列/行定义的边界),并读取t他的单元格

df = Range(Anchor).table.value

 #import pandas并操纵数据块
df = pd.DataFrame(df)#into Pandas DataFrame
df ['sum'] = df.sum(axis = 1)

#回写到Sheet3
范围(Anchor).value = df.values



测试这个实现没有在excel文件


$中调整现有公式b $ b

让我知道这是否解决了你的问题,如果有什么可以帮助的话。



对于xlwings的开发者来说,这是可能的。






以下是来自@jamzsabb的进一步问题之后我更早的答案的更新,并在xlwings更新后反映了一个已更改的API到$ = 0.9.0。

  import xlwings as xw 
import pandas as pd
target_df = xw .Range('A7')。options(pd.DataFrame,expand ='table')。value#只有当'A7'单元格(单元格与在感兴趣的区域)在活动工作表
#otherwise do:
#sht = xw.Book(r'path到你的xlxs文件\\\
ame_of_file.xlsx`).sheets ['表的名称' ]
#target_df = sht.Range('A7')。options(pd.DataFrame,expand ='table')。value#您还可以将A7更改为您给单元格的任何名称像'interest_table`


Many spreadsheets have formulas and formatting that Python tools for reading and writing Excel files cannot faithfully reproduce. That means that any file I want to create programmatically must be something I basically create from scratch, and then other Excel files (with the aforementioned sophistication) have to refer to that file (which creates a variety of other dependency issues).

My understanding of Excel file 'tabs' is that they're actually just a collection of XML files. Well, is it possible to use pandas (or one of the underlying read/write engines such as xlsxwriter or openpyxl to modify just one of the tabs, leaving other tabs (with more wicked stuff in there) intact?

EDIT: I'll try to further articulate the problem with an example.

  • Excel Sheet test.xlsx has four tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4
  • I read Sheet3 into a DataFrame (let's call it df) using pandas.read_excel()
  • Sheet1 and Sheet2 contain formulas, graphs, and various formatting that neither openpyxl nor xlrd can successfully parse, and Sheet4 contains other data. I don't want to touch those tabs at all.
  • Sheet2 actually has some references to cells on Sheet3
  • I make some edits to df and now want to write it back to sheet3, leaving the other sheets untouched (and the references to it from other worksheets in the workbook intact)

Can I do that and, if so, how?

解决方案

I had a similar question regarding the interaction between excel and python (in particular, pandas), and I was referred to this question.

Thanks to some pointers by stackoverflow community, I found a package called xlwings that seems to cover a lot of the functionalities HaPsantran required.

To use the OP's example:

Working with an existing excel file, you can drop an anchor in the data block (Sheet3) you want to import to pandas by naming it in excel and do:

# opened an existing excel file

wb = Workbook(Existing_file)

# Find in the excel file a named cell and reach the boundary of the cell block (boundary defined by empty column / row) and read the cell 

df = Range(Anchor).table.value

# import pandas and manipulate the data block
df = pd.DataFrame(df) # into Pandas DataFrame
df['sum'] = df.sum(axis= 1)

# write back to Sheet3
Range(Anchor).value = df.values

tested that this implementation didn't temper existing formula in the excel file

Let me know if this solves your problem and if there's anything I can help.

Big kudos to the developer of xlwings, they made this possible.


Below is an update to my earlier answer after further question from @jamzsabb, and to reflect a changed API after xlwings updated to >= 0.9.0.

import xlwings as xw
import pandas as pd
target_df = xw.Range('A7').options(pd.DataFrame, expand='table').value # only do this if the 'A7' cell (the cell within area of interest) is in active worksheet
#otherwise do:
#sht = xw.Book(r'path to your xlxs file\name_of_file.xlsx`).sheets['name of sheet']
#target_df = sht.Range('A7').options(pd.DataFrame, expand='table').value # you can also change 'A7' to any name that you've given to a cell like 'interest_table`

这篇关于 pandas 可以阅读和修改单个Excel文件工作表(选项卡),而不修改文件的其余部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆