使用来自另一个 pandas 数据帧的信息填充 pandas 数据帧 [英] Fill a Pandas dataframe using information from another Pandas dataframe

查看:129
本文介绍了使用来自另一个 pandas 数据帧的信息填充 pandas 数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,其中包含信息:

 索引年月日符号交易nr_shares 
2011- 01-10 2011 1 10 AAPL购买1500
2011-01-13 2011 1 13 GOOG卖1000

,我想填写第二个零填充的熊猫数据框

  index AAPL GOOG 
2011- 01-10 0 0
2011-01-11 0 0
2011-01-12 0 0
2011-01-13 0 0
/ pre>

使用第一个数据框中的信息,以便获得

 索引AAPL GOOG 
2011-01-10 1500 0
2011-01-11 0 0
2011-01-12 0 0
2011-01-13 0 -1000

可以看出,在相关日期,指定数量的股票的买卖交易有已经输入了适当的列,正数为a购买和负数的卖单。



我该如何完成这个?我必须循环使用第一个数据帧索引,并使用嵌套的if语句检查符号和事务列,然后写入第二个数据帧,还是有一个更优雅的数据框方法,我可以使用?

解决方案

您可以使用 pivot_table 。从(编辑稍微复杂一点):

 >>> df1 
索引年月日符号交易nr_shares
0 2011-01-10 2011 1 10 AAPL购买1500
1 2011-01-10 2011 1 10 AAPL卖200
2 2011 -01-10 2011 1 10 GOOG卖500
3 2011-01-10 2011 1 10 GOOG买600
4 2011-01-13 2011 1 13 GOOG卖1000
>> > df2
index AAPL GOOG
0 2011-01-10 0 0
1 2011-01-11 0 0
2 2011-01-12 0 0
3 2011 -01-13 0 0

我们可以签署股票:

 >>> df1 [nr_shares] = df1.apply(lambda row:row [nr_shares] *(-1 if row [transaction] ==Sellelse 1),axis = 1)
> >> df1
索引年月日符号交易nr_shares
0 2011-01-10 2011 1 10 AAPL购买1500
1 2011-01-10 2011 1 10 AAPL卖-200
2 2011-01-10 2011 1 10 GOOG卖-500
3 2011-01-10 2011 1 10 GOOG买600
4 2011-01-13 2011 1 13 GOOG卖-1000

然后,您可以转动 df1 。默认情况下,它使用聚合值的平均值,但是我们需要总和:

 >>> a = df1.pivot_table(values =nr_shares,rows =index,cols =symbol,
aggfunc = sum)
>>> a
符号AAPL GOOG
索引
2011-01-10 1300 100
2011-01-13 NaN -1000

b 相同的索引:

 >>> b = df2.set_index(index)
>>> b
AAPL GOOG
索引
2011-01-10 0 0
2011-01-11 0 0
2011-01-12 0 0
2011 -01-13 0 0

然后添加:

 >>> (a + b).fillna(0)
符号AAPL GOOG
索引
2011-01-10 1300 100
2011-01-11 0 0
2011- 01-12 0 0
2011-01-13 0 -1000


I have one Pandas dataframe that contains information thus:

index       year  month day symbol transaction  nr_shares
2011-01-10  2011  1     10  AAPL       Buy       1500
2011-01-13  2011  1     13  GOOG       Sell      1000

and I would like to fill a second, zero-filled Pandas dataframe

index        AAPL  GOOG
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

using the information from the first dataframe so I get

index        AAPL  GOOG
2011-01-10   1500    0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0  -1000

where it can be seen that on the relevant dates the buy and sell transactions for a specified number of shares have been entered in the appropriate column, with a positive number for a buy and a negative number for a sell order.

How can I accomplish this? Will I have to loop over the first dataframe index and check the symbol and transaction columns using nested "if" statements and then write to the second dataframe, or is there a more elegant dataframe method that I could use?

解决方案

You could use pivot_table. Starting from (edited to be slightly more complicated):

>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell        200
2  2011-01-10  2011      1   10   GOOG        Sell        500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell       1000
>>> df2
        index  AAPL  GOOG
0  2011-01-10     0     0
1  2011-01-11     0     0
2  2011-01-12     0     0
3  2011-01-13     0     0

We can sign the shares:

>>> df1["nr_shares"] = df1.apply(lambda row: row["nr_shares"] * (-1 if row["transaction"] == "Sell" else 1), axis=1)
>>> df1
        index  year  month  day symbol transaction  nr_shares
0  2011-01-10  2011      1   10   AAPL         Buy       1500
1  2011-01-10  2011      1   10   AAPL        Sell       -200
2  2011-01-10  2011      1   10   GOOG        Sell       -500
3  2011-01-10  2011      1   10   GOOG         Buy        600
4  2011-01-13  2011      1   13   GOOG        Sell      -1000

And then you can pivot df1. By default it uses the mean of the aggregated values, but we want the sum:

>>> a = df1.pivot_table(values="nr_shares", rows="index", cols="symbol",
                    aggfunc=sum)
>>> a
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-13   NaN -1000

Give b the same index:

>>> b = df2.set_index("index")
>>> b
            AAPL  GOOG
index                 
2011-01-10     0     0
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0     0

And then add them:

>>> (a+b).fillna(0)
symbol      AAPL  GOOG
index                 
2011-01-10  1300   100
2011-01-11     0     0
2011-01-12     0     0
2011-01-13     0 -1000

这篇关于使用来自另一个 pandas 数据帧的信息填充 pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆