pandas :与外部产品结合 [英] Pandas: join with outer product
问题描述
我想将针对多种商品(此处为:Water,Elec)和区域类型(Com,Ind,Res)的查找表(demand
)与作为表的DataFrame(areas
)相乘这些区域类型的区域.
I want to multiply a lookup table (demand
), given for multiple commodities (here: Water, Elec) and area types (Com, Ind, Res) with a DataFrame (areas
) that is a table of areas for these area types.
import pandas as pd
areas = pd.DataFrame({'Com':[1,2,3], 'Ind':[4,5,6]})
demand = pd.DataFrame({'Water':[4,3],
'Elec':[8,9]}, index=['Com', 'Ind'])
之前:
areas
Com Ind
0 1 4
1 2 5
2 3 6
demand
Elec Water
Com 8 4
Ind 9 3
之后:
area_demands
Com Ind
Elec Water Elec Water
0 8 4 36 12
1 16 8 45 15
2 24 12 54 18
我的尝试
详细且不完整;不适用于任意数量的商品.
Verbose and incomplete; does not work for arbitrary number of commodities.
areas = pd.DataFrame({'area': areas.stack()})
areas.index.names = ['Edge', 'Type']
both = areas.reset_index(1).join(demand, on='Type')
both['Elec'] = both['Elec'] * both['area']
both['Water'] = both['Water'] * both['area']
del both['area']
# almost there; it must be late, I fail to make 'Type' a hierarchical column...
快到了:
Type Elec Water
Edge
0 Com 8 4
0 Ind 36 12
1 Com 16 8
1 Ind 45 15
2 Com 24 12
2 Ind 54 18
简而言之
如何以一种体面的方式将数据框areas
和demand
连接/相乘?
How to join/multiply the DataFrames areas
and demand
together in a decent way?
推荐答案
import pandas as pd
areas = pd.DataFrame({'Com':[1,2,3], 'Ind':[4,5,6]})
demand = pd.DataFrame({'Water':[4,3],
'Elec':[8,9]}, index=['Com', 'Ind'])
def multiply_by_demand(series):
return demand.ix[series.name].apply(lambda x: x*series).stack()
df = areas.apply(multiply_by_demand).unstack(0)
print(df)
收益
Com Ind
Elec Water Elec Water
0 8 4 36 12
1 16 8 45 15
2 24 12 54 18
工作原理:
首先,看看当我们调用areas.apply(foo)
时会发生什么. foo
一对一地传递areas
的列:
First, look at what happens when we call areas.apply(foo)
. foo
gets passed the columns of areas
one-by-one:
def foo(series):
print(series)
In [226]: areas.apply(foo)
0 1
1 2
2 3
Name: Com, dtype: int64
0 4
1 5
2 6
Name: Ind, dtype: int64
因此,假设series
是这样的一列:
So suppose series
is one such column:
In [230]: series = areas['Com']
In [231]: series
Out[231]:
0 1
1 2
2 3
Name: Com, dtype: int64
我们可以以此方式对这个系列进行多方面的需求:
We can muliply demand by this series this way:
In [229]: demand.ix['Com'].apply(lambda x: x*series)
Out[229]:
0 1 2
Elec 8 16 24
Water 4 8 12
这个数字只有我们想要的一半,但没有我们想要的形式.
现在,apply
需要返回一个Series
,而不是一个DataFrame
.将DataFrame
转换为Series
的一种方法是使用stack
.看看如果我们
stack
此DataFrame.列成为索引的新级别:
This has half the numbers we want, but not in the form we want them.
Now apply
needs to return a Series
, not a DataFrame
. One way to turn a DataFrame
into a Series
is to use stack
. Look at what happens if we
stack
this DataFrame. The columns become a new level of the index:
In [232]: demand.ix['Com'].apply(lambda x: x*areas['Com']).stack()
Out[232]:
Elec 0 8
1 16
2 24
Water 0 4
1 8
2 12
dtype: int64
因此,使用它作为multiply_by_demand
的返回值,我们得到:
So, using this as the return value of multiply_by_demand
, we get:
In [235]: areas.apply(multiply_by_demand)
Out[235]:
Com Ind
Elec 0 8 36
1 16 45
2 24 54
Water 0 4 12
1 8 15
2 12 18
现在,我们希望索引的第一级成为列.可以使用unstack
:
Now we want the first level of the index to become columns. This can be done with unstack
:
In [236]: areas.apply(multiply_by_demand).unstack(0)
Out[236]:
Com Ind
Elec Water Elec Water
0 8 4 36 12
1 16 8 45 15
2 24 12 54 18
根据注释中的请求,这是pivot_table
解决方案:
Per the request in the comments, here is the pivot_table
solution:
import pandas as pd
areas = pd.DataFrame({'Com':[1,2,3], 'Ind':[4,5,6]})
demand = pd.DataFrame({'Water':[4,3],
'Elec':[8,9]}, index=['Com', 'Ind'])
areas = pd.DataFrame({'area': areas.stack()})
areas.index.names = ['Edge', 'Type']
both = areas.reset_index(1).join(demand, on='Type')
both['Elec'] = both['Elec'] * both['area']
both['Water'] = both['Water'] * both['area']
both.reset_index(inplace=True)
both = both.pivot_table(values=['Elec', 'Water'], rows='Edge', cols='Type')
both = both.reorder_levels([1,0], axis=1)
both = both.reindex(columns=both.columns[[0,2,1,3]])
print(both)
这篇关于 pandas :与外部产品结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!