n维表查找:数组,数据框还是字典? [英] n-dimensional table lookup: array, dataframe, or dictionary?
问题描述
我正在尝试找到进行n维表查找的最佳方法.在此示例中,有一个数据框,其中包含一个人的州和年份,我想通过在一个表(可以是数组,数据框或字典)中查找来找到相关的税率.首先,请考虑通过数组进行操作:
I'm trying to find the best way to do n-dimensional table lookups. In this example, there is a dataframe that contains a person's state and the year, and I want to find the relevant tax rate by looking it up in a table (which could be an array, dataframe, or dictionary). First, consider doing it via an array:
nobs = 4
df = DataFrame( { 'state' : np.tile( [ 'tx', 'ny'], nobs/2 ),
'year' : np.tile( [ 2008, 2008, 2009, 2009 ], nobs/4 ) } )
dct = { 'tx':0, 'ny':1 }
# rows are 2008 and 2009, columns are 'tx' and 'ny'
rate_arr = np.array( [[.05,.06],
[.08,.09]] )
df['rate1'] = rate_arr[ df.year-2008, df.state.map(dct) ]
state year rate1
0 tx 2008 0.05
1 ny 2008 0.06
2 tx 2009 0.08
3 ny 2009 0.09
以上正是我想要的,我只是想看看是否有更好的方法.像是标记numpy数组的好方法吗?
The above is exactly what I want, I just want to see if there is a better way. Like, any good way to label a numpy array?
使用数据框作为查询似乎可以自动转换状态和年份值,但我只能使它适用于一个维度,而不是两个维度:
Using a dataframe as a lookup would seem to give me the automatic translation of state and year values, but I can only get this to work for one dimension, not two:
rate_df = DataFrame( { 2008: [ .05, .06 ],
2009: [ .08, .09 ] } , index=(['tx','ny']) )
# doesn't work
df['rate3'] = rate_df[ df.year, df.state ]
或者,也许是一个嵌套的字典?同样,我可以使它在一个维度上起作用,但不能在两个维度上起作用:
Alternatively, maybe a nested dictionary? Again, I can get this to work in one dimension but not two:
rate_dict = { 'tx': { 2008: .05, 2009: .08 },
'ny': { 2008: .06, 2009: .09 } }
# doesn't work
df['rate2'] = df.year.map( df.state.map(rate_dict) )
推荐答案
You're looking for lookup
:
In [21]: rate_df.lookup(df['state'], df['year'])
Out[21]: array([ 0.05, 0.06, 0.08, 0.09])
In [22]: df['rate2'] = res.lookup(df['state'], df['year'])
In [23]: df
Out[23]:
state year rate1 rate2
0 tx 2008 0.05 0.05
1 ny 2008 0.06 0.06
2 tx 2009 0.08 0.08
3 ny 2009 0.09 0.09
注意:您可以指定索引和列,以从numpy数组中获取带标签的DataFrame:
Note: you can specify the index and columns to get a labelled DataFrame from a numpy array:
In [11]: rate_df = pd.DataFrame(rate_arr.T, index=['tx', 'ny'], columns=[2008, 2009])
In [12]: rate_df
Out[12]:
2008 2009
tx 0.05 0.08
ny 0.06 0.09
更新:我需要转置numpy数组,以便rate_df
正确定位.
Update: I needed to transpose the numpy array so that rate_df
was correctly oriented.
这篇关于n维表查找:数组,数据框还是字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!