如何使用多个函数创建多个数据框 [英] How to create multiple dataframes using multiple functions

查看:53
本文介绍了如何使用多个函数创建多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常编写一个函数来根据我输入的参数返回不同的数据帧.这是一个示例数据框:

I quite often write a function to return different dataframes based on the parameters I enter. Here's an example dataframe:

np.random.seed(1111)
df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B','Group C','Group D'], 10000),
'Sub-Category':np.random.choice( ['X','Y','Z'], 10000),
'Sub-Category-2':np.random.choice( ['G','F','I'], 10000),
'Product':np.random.choice( ['Product 1','Product 2','Product 3'], 10000),
'Units_Sold':np.random.randint(1,100, size=(10000)),
'Dollars_Sold':np.random.randint(100,1000, size=10000),
'Customer':np.random.choice(pd.util.testing.rands_array(10,25,dtype='str'),10000),
'Date':np.random.choice( pd.date_range('1/1/2016','12/31/2018',  
                      freq='M'), 10000)})

然后我创建了一个函数来为我执行小计,如下所示:

I then created a function to perform sub-totals for me like this:

def some_fun(DF1, agg_column, myList=[], *args):
    y = pd.concat([
    DF1.assign(**{x:'[Total]' for x in myList[i:]})\
            .groupby(myList).agg(sumz = (agg_column,'sum')) for i in range(1,len(myList)+1)]).sort_index().unstack(0)
    return y

然后我写出我将作为参数传递给函数的列表:

I then write out lists that I'll pass as arguments to the function:

list_one = [pd.Grouper(key='Date',freq='A'),'Category','Product']
list_two = [pd.Grouper(key='Date',freq='A'),'Category','Sub-Category','Sub-Category-2']
list_three = [pd.Grouper(key='Date',freq='A'),'Sub-Category','Product']

然后我必须通过我的函数运行每个列表来创建新的数据帧:

I then have to run each list through my function creating new dataframes:

df1 = some_fun(df,'Units_Sold',list_one)
df2 = some_fun(df,'Dollars_Sold',list_two)
df3 = some_fun(df,'Units_Sold',list_three)

然后我使用一个函数将这些数据帧中的每一个写入 Excel 工作表.这只是一个例子 - 我进行了 10 次以上的相同练习.

I then use a function to write each of these dataframes to an Excel worksheet. This is just an example - I perform this same exercise 10+ times.

我的问题 - 有没有比用应用的函数信息写出 df1、df2、df3 更好的方法来执行此任务?我是否应该考虑使用字典或其他一些数据类型来使用函数以 Python 方式执行此操作?

My question - is there a better way to perform this task than to write out df1, df2, df3 with the function information applied? Should I be looking at using a dictionary or some other data type to do this my pythonically with a function?

推荐答案

IIUC,

正如 Thomas 所建议的,我们可以使用字典来解析您的数据,但对您的函数进行一些小的修改,我们可以使用字典来保存所有所需的数据,然后将其传递给你的功能.

as Thomas has suggested we can use a dictionary to parse through your data, but with some minor modifications to your function, we can use the dictionary to hold all the required data then pass that through to your function.

这个想法是将两种类型的键,列列表和参数传递给您的 pd.Grouper 调用.

the idea is to pass two types of keys, the list of columns and the arguments to your pd.Grouper call.

data_dict = {
    "Units_Sold": {"key": "Date", "freq": "A"},
    "Dollars_Sold": {"key": "Date", "freq": "A"},
    "col_list_1": ["Category", "Product"],
    "col_list_2": ["Category", "Sub-Category", "Sub-Category-2"],
    "col_list_3": ["Sub-Category", "Product"],
}

<小时>

def some_fun(dataframe, agg_col, dictionary,column_list, *args):

    key = dictionary[agg_col]["key"]

    frequency = dictionary[agg_col]["freq"]

    myList = [pd.Grouper(key=key, freq=frequency), *dictionary[column_list]]

    y = (
        pd.concat(
            [
                dataframe.assign(**{x: "[Total]" for x in myList[i:]})
                .groupby(myList)
                .agg(sumz=(agg_col, "sum"))
                for i in range(1, len(myList) + 1)
            ]
        )
        .sort_index()
        .unstack(0)
    )
    return y

<小时>

测试.

df1 = some_fun(df,'Units_Sold',data_dict,'col_list_3')
print(df1)
                                 sumz                      
Date                   2016-12-31 2017-12-31 2018-12-31
Sub-Category Product                                   
X            Product 1      18308      17839      18776
             Product 2      18067      19309      18077
             Product 3      17943      19121      17675
             [Total]        54318      56269      54528
Y            Product 1      20699      18593      18103
             Product 2      18642      19712      17122
             Product 3      17701      19263      20123
             [Total]        57042      57568      55348
Z            Product 1      19077      17401      19138
             Product 2      17207      21434      18817
             Product 3      18405      17300      17462
             [Total]        54689      56135      55417
[Total]      [Total]       166049     169972     165293

如果您想自动编写 10x 工作表,我们可以再次通过对您的函数进行字典调用来做到这一点:

as you want to automate the writing of the 10x worksheets, we can again do that with a dictionary call over your function:

matches = {'Units_Sold': ['col_list_1','col_list_3'],
          'Dollars_Sold' : ['col_list_2']}

然后是一个简单的 for 循环将所有文件写入单个 Excel 工作表,更改它以匹配您所需的行为.

then a simple for loop to write all the files to a single excel sheet, change this to match your required behavior.

writer = pd.ExcelWriter('finished_excel_file.xlsx')
for key,value in matches.items():
    for items in value:        
        dataframe = some_fun(df,k,data_dict,items)
        dataframe.to_excel(writer,f'{key}_{items}')
writer.save()

这篇关于如何使用多个函数创建多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆