在Pandas数据框中对值进行矢量化查找 [英] Vectorized look-up of values in Pandas dataframe
问题描述
我有两个熊猫数据框,一个叫做"orders",另一个叫做"daily_prices". daily_prices如下:
I have two pandas dataframes one called 'orders' and another one called 'daily_prices'. daily_prices is as follows:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
顺序如下:
direction size ticker prices
2011-01-10 Buy 1500 AAPL 339.44
2011-01-13 Sell 1500 AAPL 342.64
2011-01-13 Buy 4000 IBM 143.92
2011-01-26 Buy 1000 GOOG 616.50
2011-02-02 Sell 4000 XOM 79.46
2011-02-10 Buy 4000 XOM 79.68
2011-03-03 Sell 1000 GOOG 609.56
2011-03-03 Sell 2200 IBM 158.73
2011-06-03 Sell 3300 IBM 160.97
2011-05-03 Buy 1500 IBM 167.84
2011-06-10 Buy 1200 AAPL 323.03
2011-08-01 Buy 55 GOOG 606.77
2011-08-01 Sell 55 GOOG 606.77
2011-12-20 Sell 1200 AAPL 392.46
两个数据帧的
索引均为datetime.date. 通过使用列表理解来循环遍历所有订单并在"daily_prices"数据框中查找特定日期的特定报价,然后将该列表作为列添加到订单"数据框中的价格"列中, 订单"数据框.我想使用数组操作而不是循环执行此操作.能做到吗我尝试使用:
index of both dataframes is datetime.date. 'prices' column in the 'orders' dataframe was added by using a list comprehension to loop through all the orders and look up the specific ticker for the specific date in the 'daily_prices' data frame and then adding that list as a column to the 'orders' dataframe. I would like to do this using an array operation rather than something that loops. can it be done? i tried to use:
daily_prices.ix [日期,股票代码]
daily_prices.ix[dates,tickers]
,但这将返回两个列表的笛卡尔乘积矩阵.我希望它返回一个仅在指定日期显示指定报价的价格的列向量.
but this returns a matrix of cartesian product of the two lists. i want it to return a column vector of only the price of a specified ticker for a specified date.
推荐答案
使用为此目的而专门设计的朋友lookup
:
Use our friend lookup
, designed precisely for this purpose:
In [17]: prices
Out[17]:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
In [18]: orders
Out[18]:
Date direction size ticker prices
0 2011-01-10 00:00:00 Buy 1500 AAPL 339.44
1 2011-01-13 00:00:00 Sell 1500 AAPL 342.64
2 2011-01-13 00:00:00 Buy 4000 IBM 143.92
3 2011-01-26 00:00:00 Buy 1000 GOOG 616.50
4 2011-02-02 00:00:00 Sell 4000 XOM 79.46
5 2011-02-10 00:00:00 Buy 4000 XOM 79.68
6 2011-03-03 00:00:00 Sell 1000 GOOG 609.56
7 2011-03-03 00:00:00 Sell 2200 IBM 158.73
8 2011-06-03 00:00:00 Sell 3300 IBM 160.97
9 2011-05-03 00:00:00 Buy 1500 IBM 167.84
10 2011-06-10 00:00:00 Buy 1200 AAPL 323.03
11 2011-08-01 00:00:00 Buy 55 GOOG 606.77
12 2011-08-01 00:00:00 Sell 55 GOOG 606.77
13 2011-12-20 00:00:00 Sell 1200 AAPL 392.46
In [19]: prices.lookup(orders.Date, orders.ticker)
Out[19]:
array([ 339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56,
158.73, 160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
这篇关于在Pandas数据框中对值进行矢量化查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!