使用for循环重命名 pandas 数据框列 [英] Renaming pandas data frame columns using a for loop
问题描述
我不确定这是否是愚蠢的方法,但是我有几个数据帧,所有的数据帧都有相同的列.我需要重命名每个列中的列以反映每个数据框的名称(此后,我将对所有这些列进行外部合并).
I'm not sure if this is a dumb way to go about things, but I've got several data frames, all of which have identical columns. I need to rename the columns within each to reflect the names of each data frame (I'll be performing an outer merge of all of these afterwards).
假设数据帧称为df1
,df2
和df3
,每个数据帧包含列name
,date
和count
.
Let's say the data frames are called df1
, df2
and df3
, and each contains the columns name
, date
, and count
.
我想将df1
中的每一列重命名为name_df1
,date_df1
和count_df1
.
I'd like to rename each of the columns in df1
into name_df1
, date_df1
, and count_df1
.
我编写了一个函数来重命名列,因此:
I've written a function to rename the columns, thus:
df_list=[df1, df2, df3]
def rename_cols():
col_name="name"+suffix
col_count="count"+suffix
col_date="date"+suffix
for x in df_list:
if x['name'].tail(1).item() == df1['name'].tail(1).item():
suffix="_"+"df1"
rename_cols()
continue
elif x['name'].tail(1).item() == df2['name'].tail(1).item():
suffix="_"+"df2"
rename_cols()
continue
else:
suffix="_"+"df3"
rename_cols()
col_names=[col_name,col_date,col_count]
x.columns=col_names
不幸的是,我收到以下错误:KeyError: 'name'
Unfortunately, I get the following error: KeyError: 'name'
我真的很难弄清楚为什么会这样. df1的列(df_list
中的第一个数据帧)被重命名.其他所有内容都保持不变...我是在搞乱基本语法(可能是),还是我对事情应该如何工作有根本的误解?
I'm really struggling to figure out why that's going on. The columns for df1, the first data frame in the df_list
, gets renamed. Everything else stays the same... Am I messing up basic syntax (probably), or is there a fundamental misunderstanding that I've got of how things should work?
据我所知,列表中的第一个数据帧不止一次地被遍历-但是为什么会这样呢?
From what I can ascertain, the first data frame in the list is being iterated through more than once — but why would that be the case?
推荐答案
我想您可以使用以下更简单的方法来实现此目的:
I guess you can achieve this with something simplier, like that :
df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]
如果您的DataFrame具有漂亮的名称,则可以尝试:
If your DataFrames have prettier names you can try:
df_names=('Home', 'Work', 'Park')
for df_name in df_names:
df = globals()[df_name]
df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]
或者您可以通过查找globals()
(或locals()
)来获取每个变量的名称:
Or you can fetch the name of each variable by looking up into globals()
(or locals()
) :
df_list = [Home, Work, Park]
for df in df_list:
name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]
这篇关于使用for循环重命名 pandas 数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!