如何使用多个函数创建多个数据框 [英] How to create multiple dataframes using multiple functions
问题描述
我经常编写一个函数来根据我输入的参数返回不同的数据帧.这是一个示例数据框:
I quite often write a function to return different dataframes based on the parameters I enter. Here's an example dataframe:
np.random.seed(1111)
df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B','Group C','Group D'], 10000),
'Sub-Category':np.random.choice( ['X','Y','Z'], 10000),
'Sub-Category-2':np.random.choice( ['G','F','I'], 10000),
'Product':np.random.choice( ['Product 1','Product 2','Product 3'], 10000),
'Units_Sold':np.random.randint(1,100, size=(10000)),
'Dollars_Sold':np.random.randint(100,1000, size=10000),
'Customer':np.random.choice(pd.util.testing.rands_array(10,25,dtype='str'),10000),
'Date':np.random.choice( pd.date_range('1/1/2016','12/31/2018',
freq='M'), 10000)})
然后我创建了一个函数来为我执行小计,如下所示:
I then created a function to perform sub-totals for me like this:
def some_fun(DF1, agg_column, myList=[], *args):
y = pd.concat([
DF1.assign(**{x:'[Total]' for x in myList[i:]})\
.groupby(myList).agg(sumz = (agg_column,'sum')) for i in range(1,len(myList)+1)]).sort_index().unstack(0)
return y
然后我写出我将作为参数传递给函数的列表:
I then write out lists that I'll pass as arguments to the function:
list_one = [pd.Grouper(key='Date',freq='A'),'Category','Product']
list_two = [pd.Grouper(key='Date',freq='A'),'Category','Sub-Category','Sub-Category-2']
list_three = [pd.Grouper(key='Date',freq='A'),'Sub-Category','Product']
然后我必须通过我的函数运行每个列表来创建新的数据帧:
I then have to run each list through my function creating new dataframes:
df1 = some_fun(df,'Units_Sold',list_one)
df2 = some_fun(df,'Dollars_Sold',list_two)
df3 = some_fun(df,'Units_Sold',list_three)
然后我使用一个函数将这些数据帧中的每一个写入 Excel 工作表.这只是一个例子 - 我进行了 10 次以上的相同练习.
I then use a function to write each of these dataframes to an Excel worksheet. This is just an example - I perform this same exercise 10+ times.
我的问题 - 有没有比用应用的函数信息写出 df1、df2、df3 更好的方法来执行此任务?我是否应该考虑使用字典或其他一些数据类型来使用函数以 Python 方式执行此操作?
My question - is there a better way to perform this task than to write out df1, df2, df3 with the function information applied? Should I be looking at using a dictionary or some other data type to do this my pythonically with a function?
推荐答案
IIUC,
正如 Thomas 所建议的,我们可以使用字典来解析您的数据,但对您的函数进行一些小的修改,我们可以使用字典来保存所有所需的数据,然后将其传递给你的功能.
as Thomas has suggested we can use a dictionary to parse through your data, but with some minor modifications to your function, we can use the dictionary to hold all the required data then pass that through to your function.
这个想法是将两种类型的键,列列表和参数传递给您的 pd.Grouper
调用.
the idea is to pass two types of keys, the list of columns and the arguments to your pd.Grouper
call.
data_dict = {
"Units_Sold": {"key": "Date", "freq": "A"},
"Dollars_Sold": {"key": "Date", "freq": "A"},
"col_list_1": ["Category", "Product"],
"col_list_2": ["Category", "Sub-Category", "Sub-Category-2"],
"col_list_3": ["Sub-Category", "Product"],
}
<小时>
def some_fun(dataframe, agg_col, dictionary,column_list, *args):
key = dictionary[agg_col]["key"]
frequency = dictionary[agg_col]["freq"]
myList = [pd.Grouper(key=key, freq=frequency), *dictionary[column_list]]
y = (
pd.concat(
[
dataframe.assign(**{x: "[Total]" for x in myList[i:]})
.groupby(myList)
.agg(sumz=(agg_col, "sum"))
for i in range(1, len(myList) + 1)
]
)
.sort_index()
.unstack(0)
)
return y
<小时>
测试.
df1 = some_fun(df,'Units_Sold',data_dict,'col_list_3')
print(df1)
sumz
Date 2016-12-31 2017-12-31 2018-12-31
Sub-Category Product
X Product 1 18308 17839 18776
Product 2 18067 19309 18077
Product 3 17943 19121 17675
[Total] 54318 56269 54528
Y Product 1 20699 18593 18103
Product 2 18642 19712 17122
Product 3 17701 19263 20123
[Total] 57042 57568 55348
Z Product 1 19077 17401 19138
Product 2 17207 21434 18817
Product 3 18405 17300 17462
[Total] 54689 56135 55417
[Total] [Total] 166049 169972 165293
如果您想自动编写 10x 工作表,我们可以再次通过对您的函数进行字典调用来做到这一点:
as you want to automate the writing of the 10x worksheets, we can again do that with a dictionary call over your function:
matches = {'Units_Sold': ['col_list_1','col_list_3'],
'Dollars_Sold' : ['col_list_2']}
然后是一个简单的 for 循环将所有文件写入单个 Excel 工作表,更改它以匹配您所需的行为.
then a simple for loop to write all the files to a single excel sheet, change this to match your required behavior.
writer = pd.ExcelWriter('finished_excel_file.xlsx')
for key,value in matches.items():
for items in value:
dataframe = some_fun(df,k,data_dict,items)
dataframe.to_excel(writer,f'{key}_{items}')
writer.save()
这篇关于如何使用多个函数创建多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!