pandas :与外部产品结合 [英] Pandas: join with outer product

查看:71
本文介绍了 pandas :与外部产品结合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将针对多种商品(此处为:Water,Elec)和区域类型(Com,Ind,Res)的查找表(demand)与作为表的DataFrame(areas)相乘这些区域类型的区域.

I want to multiply a lookup table (demand), given for multiple commodities (here: Water, Elec) and area types (Com, Ind, Res) with a DataFrame (areas) that is a table of areas for these area types.

import pandas as pd
areas = pd.DataFrame({'Com':[1,2,3], 'Ind':[4,5,6]})
demand = pd.DataFrame({'Water':[4,3], 
                       'Elec':[8,9]}, index=['Com', 'Ind'])

之前:

areas
   Com  Ind
0    1    4
1    2    5
2    3    6

demand
     Elec  Water
Com     8      4
Ind     9      3

之后:

area_demands                  
     Com          Ind         
     Elec  Water  Elec  Water 
0       8      4    36     12 
1      16      8    45     15 
2      24     12    54     18 

我的尝试

详细且不完整;不适用于任意数量的商品.

Verbose and incomplete; does not work for arbitrary number of commodities.

areas = pd.DataFrame({'area': areas.stack()})
areas.index.names = ['Edge', 'Type']
both = areas.reset_index(1).join(demand, on='Type')
both['Elec'] = both['Elec'] * both['area']
both['Water'] = both['Water'] * both['area']
del both['area']
# almost there; it must be late, I fail to make 'Type' a hierarchical column...

快到了:

     Type  Elec  Water
Edge
0     Com     8      4
0     Ind    36     12
1     Com    16      8
1     Ind    45     15
2     Com    24     12
2     Ind    54     18

简而言之

如何以一种体面的方式将数据框areasdemand连接/相乘?

How to join/multiply the DataFrames areas and demand together in a decent way?

推荐答案

import pandas as pd
areas = pd.DataFrame({'Com':[1,2,3], 'Ind':[4,5,6]})
demand = pd.DataFrame({'Water':[4,3], 
                       'Elec':[8,9]}, index=['Com', 'Ind'])

def multiply_by_demand(series):
    return demand.ix[series.name].apply(lambda x: x*series).stack()
df = areas.apply(multiply_by_demand).unstack(0)
print(df)

收益

    Com          Ind       
   Elec  Water  Elec  Water
0     8      4    36     12
1    16      8    45     15
2    24     12    54     18


工作原理:

首先,看看当我们调用areas.apply(foo)时会发生什么. foo一对一地传递areas的列:

First, look at what happens when we call areas.apply(foo). foo gets passed the columns of areas one-by-one:

def foo(series):
    print(series)

In [226]: areas.apply(foo)
0    1
1    2
2    3
Name: Com, dtype: int64
0    4
1    5
2    6
Name: Ind, dtype: int64

因此,假设series是这样的一列:

So suppose series is one such column:

In [230]: series = areas['Com']

In [231]: series
Out[231]: 
0    1
1    2
2    3
Name: Com, dtype: int64

我们可以以此方式对这个系列进行多方面的需求:

We can muliply demand by this series this way:

In [229]: demand.ix['Com'].apply(lambda x: x*series)
Out[229]: 
       0   1   2
Elec   8  16  24
Water  4   8  12

这个数字只有我们想要的一半,但没有我们想要的形式. 现在,apply需要返回一个Series,而不是一个DataFrame.将DataFrame转换为Series的一种方法是使用stack.看看如果我们 stack此DataFrame.列成为索引的新级别:

This has half the numbers we want, but not in the form we want them. Now apply needs to return a Series, not a DataFrame. One way to turn a DataFrame into a Series is to use stack. Look at what happens if we stack this DataFrame. The columns become a new level of the index:

In [232]: demand.ix['Com'].apply(lambda x: x*areas['Com']).stack()
Out[232]: 
Elec   0     8
       1    16
       2    24
Water  0     4
       1     8
       2    12
dtype: int64

因此,使用它作为multiply_by_demand的返回值,我们得到:

So, using this as the return value of multiply_by_demand, we get:

In [235]: areas.apply(multiply_by_demand)
Out[235]: 
         Com  Ind
Elec  0    8   36
      1   16   45
      2   24   54
Water 0    4   12
      1    8   15
      2   12   18

现在,我们希望索引的第一级成为列.可以使用unstack:

Now we want the first level of the index to become columns. This can be done with unstack:

In [236]: areas.apply(multiply_by_demand).unstack(0)
Out[236]: 
    Com          Ind       
   Elec  Water  Elec  Water
0     8      4    36     12
1    16      8    45     15
2    24     12    54     18


根据注释中的请求,这是pivot_table解决方案:


Per the request in the comments, here is the pivot_table solution:

import pandas as pd
areas = pd.DataFrame({'Com':[1,2,3], 'Ind':[4,5,6]})
demand = pd.DataFrame({'Water':[4,3], 
                       'Elec':[8,9]}, index=['Com', 'Ind'])

areas = pd.DataFrame({'area': areas.stack()})
areas.index.names = ['Edge', 'Type']
both = areas.reset_index(1).join(demand, on='Type')
both['Elec'] = both['Elec'] * both['area']
both['Water'] = both['Water'] * both['area']
both.reset_index(inplace=True)
both = both.pivot_table(values=['Elec', 'Water'], rows='Edge', cols='Type')
both = both.reorder_levels([1,0], axis=1)
both = both.reindex(columns=both.columns[[0,2,1,3]])
print(both)

这篇关于 pandas :与外部产品结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆