使用python大 pandas 添加新的数据框的现有excel表 [英] Append existing excel sheet with new dataframe using python pandas

查看:400
本文介绍了使用python大 pandas 添加新的数据框的现有excel表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有这个代码。它完美地工作。



它循环访问文件夹中的excel文件,
删除前2行
然后将它们保存为单独的excel文件,
,它还将文件作为附加文件保存在循环中。



目前,附加文件覆盖现有文件每次我运行代码。



我需要将新数据附加到已经存在的excel表的底部('master_data.xlsx )

  dfList = [] 
path ='C:\\Test\\TestRawFile'
newpath ='C:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ b#绝对文件路径
file = os.path.join(path,fn)
如果os.path.isfile(文件):
#导入Excel文件并调用它xlsx_file
xlsx_file = pd.ExcelFile(file)
#查看excel文件表名称
xlsx_file.sheet_names
#Lo将xlsx文件的数据表作为数据帧
df = xlsx_file.parse('Sheet1',header = None)
df_NoHeader = df [2:]
data = df_NoHeader
#保存单个数据帧
data.to_excel(os.path.join(newpath,fn))

dfList.append(data)

added_data = pd.concat dfList)
additional_data.to_excel(os.path.join(newpath,'master_data.xlsx'))


$ b $我认为这是一个简单的任务,但我猜不是。
我想我需要将master_data.xlsx文件作为数据框引入,然后将索引与新的附加数据相匹配,并将其保存回来。或者也许有一个更简单的方法。任何帮助不胜感激。

解决方案

您可以使用 openpyxl 引擎与 startrow 参数:

 在[48]中:writer = pd。 ExcelWriter('c:/temp/test.xlsx',engine ='openpyxl')

在[49]中:df.to_excel(writer,index = False)

在[50]中:df.to_excel(writer,startrow = len(df)+2,index = False)

在[51]中:writer.save()

c:/temp/test.xlsx:





PS你可能还想指定 header = None 如果你不想要重复列名...


I currently have this code. It works perfectly.

It loops through excel files in a folder, removes the first 2 rows, Then saves them out as individual excel files, and it also saves the the files in the loop as an appended file.

Currently the appended file overwrites the existing file each time I run the code.

I need to append the new data to the bottom of the already existing excel sheet ('master_data.xlsx)

dfList = []
path = 'C:\\Test\\TestRawFile' 
newpath = 'C:\\Path\\To\\New\\Folder'

for fn in os.listdir(path): 
  # Absolute file path
  file = os.path.join(path, fn)
  if os.path.isfile(file): 
    # Import the excel file and call it xlsx_file 
    xlsx_file = pd.ExcelFile(file) 
    # View the excel files sheet names 
    xlsx_file.sheet_names 
    # Load the xlsx files Data sheet as a dataframe 
    df = xlsx_file.parse('Sheet1',header= None) 
    df_NoHeader = df[2:] 
    data = df_NoHeader 
    # Save individual dataframe
    data.to_excel(os.path.join(newpath, fn))

    dfList.append(data) 

appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))

I thought this would be a simple task, but I guess not. I think I need to bring in the master_data.xlsx file as a dataframe, then match the index up with the new appended data, and save it back out. Or maybe there is an easier way. Any Help is appreciated.

解决方案

You can use openpyxl engine in conjunction with startrow parameter:

In [48]: writer = pd.ExcelWriter('c:/temp/test.xlsx', engine='openpyxl')

In [49]: df.to_excel(writer, index=False)

In [50]: df.to_excel(writer, startrow=len(df)+2, index=False)

In [51]: writer.save()

c:/temp/test.xlsx:

PS you may also want to specify header=None if you don't want to duplicate column names...

这篇关于使用python大 pandas 添加新的数据框的现有excel表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆