在 pandas 中使用动态名称创建新的数据框,并添加新列 [英] Create new dataframe in pandas with dynamic names also add new column

查看:68
本文介绍了在 pandas 中使用动态名称创建新的数据框,并添加新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df

 df = pd.DataFrame({'A':['-a',1,'a'], 
               'B':['a',np.nan,'c'],
               'ID':[1,2,2],
                't':[pd.tslib.Timestamp.now(),pd.tslib.Timestamp.now(),
                    np.nan]})

添加了新列

df['YearMonth'] = df['t'].map(lambda x: 100*x.year + x.month)

现在,我想编写一个用于进行日期比较的函数或宏,创建一个新的数据框,并向该数据框添加一个新列.

Now I want to write a function or macro which will do date comparasion, create a new dataframe also add a new column to dataframe.

我尝试过这样,但似乎我做错了:

I tried like this but seems I am going wrong:

def test(df,ym):
    df_new=df
    if(ym <= df['YearMonth']):
        df_new+"_"+ym=df_new
        return df_new+"_"+ym
    df_new+"_"+ym['new_col']=ym

现在,当我调用测试函数时,我希望创建一个新数据框,命名为df_new_201612,并且该新数据框应再增加一列,命名为new_col,其所有行的值都为ym. /p>

Now when I call test function I want a new dataframe should get created named as df_new_201612 and this new dataframe should have one more column, named as new_col that has value of ym for all the rows.

test(df,201612)

新数据框的输出为:

df_new_201612

A   B   ID  t                           YearMonth   new_col
-a  a   1   2016-12-05 12:37:56.374620  201612      201612 
1   NaN 2   2016-12-05 12:37:56.374644  201208      201612 
a   c   2   nat                         nan         201612 

推荐答案

使用动态名称创建变量通常是一种不好的做法.

Creating variables with dynamic names is typically a bad practice.

我认为,针对您的问题的最佳解决方案是将数据帧存储到字典中,并动态生成用于访问每个数据帧的密钥名称.

I think the best solution for your problem is to store your dataframes into a dictionary and dynamically generate the name of the key to access each dataframe.

import copy

dict_of_df = {}
for ym in [201511, 201612, 201710]:

    key_name = 'df_new_'+str(ym)    

    dict_of_df[key_name] = copy.deepcopy(df)

    to_change = df['YearMonth']< ym
    dict_of_df[key_name].loc[to_change, 'new_col'] = ym   

dict_of_df.keys()
Out[36]: ['df_new_201710', 'df_new_201612', 'df_new_201511']

dict_of_df
Out[37]: 
{'df_new_201511':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201612
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201612
 2   a    c   2 2016-12-05 07:53:35.943     201612   201612,
 'df_new_201612':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201612
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201612
 2   a    c   2 2016-12-05 07:53:35.943     201612   201612,
 'df_new_201710':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201710
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201710
 2   a    c   2 2016-12-05 07:53:35.943     201612   201710}

 # Extract a single dataframe
 df_2015 = dict_of_df['df_new_201511']

这篇关于在 pandas 中使用动态名称创建新的数据框,并添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆