n维表查找:数组,数据框还是字典? [英] n-dimensional table lookup: array, dataframe, or dictionary?

查看:78
本文介绍了n维表查找:数组,数据框还是字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到进行n维表查找的最佳方法.在此示例中,有一个数据框,其中包含一个人的州和年份,我想通过在一个表(可以是数组,数据框或字典)中查找来找到相关的税率.首先,请考虑通过数组进行操作:

I'm trying to find the best way to do n-dimensional table lookups. In this example, there is a dataframe that contains a person's state and the year, and I want to find the relevant tax rate by looking it up in a table (which could be an array, dataframe, or dictionary). First, consider doing it via an array:

nobs = 4
df = DataFrame( { 'state' : np.tile( [ 'tx', 'ny'], nobs/2 ),
                  'year'  : np.tile( [ 2008, 2008, 2009, 2009 ], nobs/4 ) } )

dct = { 'tx':0, 'ny':1 }

# rows are 2008 and 2009, columns are 'tx' and 'ny'
rate_arr = np.array( [[.05,.06],
                      [.08,.09]] )

df['rate1'] = rate_arr[ df.year-2008, df.state.map(dct) ]

  state  year  rate1
0    tx  2008   0.05
1    ny  2008   0.06
2    tx  2009   0.08
3    ny  2009   0.09

以上正是我想要的,我只是想看看是否有更好的方法.像是标记numpy数组的好方法吗?

The above is exactly what I want, I just want to see if there is a better way. Like, any good way to label a numpy array?

使用数据框作为查询似乎可以自动转换状态和年份值,但我只能使它适用于一个维度,而不是两个维度:

Using a dataframe as a lookup would seem to give me the automatic translation of state and year values, but I can only get this to work for one dimension, not two:

rate_df = DataFrame( { 2008: [ .05, .06 ],
                       2009: [ .08, .09 ] } , index=(['tx','ny']) )

# doesn't work
df['rate3'] = rate_df[ df.year, df.state ]

或者,也许是一个嵌套的字典?同样,我可以使它在一个维度上起作用,但不能在两个维度上起作用:

Alternatively, maybe a nested dictionary? Again, I can get this to work in one dimension but not two:

rate_dict = { 'tx': { 2008: .05, 2009: .08 },
              'ny': { 2008: .06, 2009: .09 } }

# doesn't work
df['rate2'] = df.year.map( df.state.map(rate_dict) )

推荐答案

您正在寻找

You're looking for lookup:

In [21]: rate_df.lookup(df['state'], df['year'])
Out[21]: array([ 0.05,  0.06,  0.08,  0.09])

In [22]: df['rate2'] = res.lookup(df['state'], df['year'])

In [23]: df
Out[23]:
  state  year  rate1  rate2
0    tx  2008   0.05   0.05
1    ny  2008   0.06   0.06
2    tx  2009   0.08   0.08
3    ny  2009   0.09   0.09


注意:您可以指定索引列,以从numpy数组中获取带标签的DataFrame:


Note: you can specify the index and columns to get a labelled DataFrame from a numpy array:

In [11]: rate_df = pd.DataFrame(rate_arr.T, index=['tx', 'ny'], columns=[2008, 2009])

In [12]: rate_df
Out[12]:
    2008  2009
tx  0.05  0.08
ny  0.06  0.09

更新:我需要转置numpy数组,以便rate_df正确定位.

Update: I needed to transpose the numpy array so that rate_df was correctly oriented.

这篇关于n维表查找:数组,数据框还是字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆