如何在使用 Pandas 读取多个文件时重命名列 [英] How to rename columns while reading multiple files using pandas

查看:91
本文介绍了如何在使用 Pandas 读取多个文件时重命名列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框(用于 excel 文件),其中包含以下列

I have two data frames (to excel files) with the below columns

文件 1- 列

person_ID   Test_CODE   REGISTRATION_DATE   subject_CD   subject_DESCRIPTION    subject_TYPE

文件 2 列

person_ID   Test_CODE   REGISTRATION_DATE   subject_Code subject_DESCRIPTION    subject_Indicator

但是,subject_CDsubject_Code 列的含义相同.同样,subject_TYPEsubject_Indicator 的意思是一样的.所以,我想在阅读excel文件时重命名它们

However, the columns subject_CD and subject_Code mean the same. Similarly, subject_TYPE and subject_Indicator mean the same. So, I would like to rename them when I read the excel file

我尝试了下面的方法,但没有用

I tried the below but it doesn't work

dfs = []       
for f in files:
    df = pd.read_excel(f, sep=",",low_memory=False)
    print(df.columns)
    df1 = df[df.columns.intersection(['person_ID','Test_CODE','REGISTRATION_DATE','subject_CD','subject_DESCRIPTION','subject_TYPE'])].rename(columns={'subject_TYPE':'subject_Indicator','subject_CD':'subject_Code'})
    dfs.append(df1)

因为我想追加/合并这两个文件,所以我希望最终数据框中的列名如下所示

Since, I would like to append/merge both the files, I expect the column names in my final data frame to be like as shown below

person_ID   Test_CODE   REGISTRATION_DATE   subject_Code subject_DESCRIPTION subject_Indicator

可以帮我解决这个问题吗?

Can help me with this?

推荐答案

如果您想保留读取的第一个文件的列,您可以执行以下操作,存储第一次迭代的列并将该列分配给其余文件:

If you want to retain the columns of the first file which is read you can do something like this which stores the columns of the first iteration and assigns the column to the rest of the files:

dfs = []       
for e,f in enumerate(files):
    df = pd.read_excel(f)
    print(df.columns)
    if e == 0:
        col = df.columns
    df.columns=col
    dfs.append(df)


Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
       'subject_DESCRIPTION', 'subject_TYPE'],
      dtype='object')
Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_Code',
       'subject_DESCRIPTION', 'subject_Indicator'],
      dtype='object')


[df.columns for df in dfs] #pd.concat(dfs)

[Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
        'subject_DESCRIPTION', 'subject_TYPE'],
       dtype='object'),
 Index(['person_ID', 'Test_CODE', 'REGISTRATION_DATE', 'subject_CD',
        'subject_DESCRIPTION', 'subject_TYPE'],
       dtype='object')]

这篇关于如何在使用 Pandas 读取多个文件时重命名列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆