pandas :将数据框添加到数据框-匹配索引和列值 [英] Pandas: add dataframes to dataframe - match on index and column value

查看:92
本文介绍了 pandas :将数据框添加到数据框-匹配索引和列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将pandas数据帧添加到具有不同长度的另一个数据帧,以使结果中的值与(时间)索引和来自所有数据帧中的列的键值对齐.

I am trying to add pandas dataframes to another dataframe with different lengths such that the values in the result are aligned with both the (time)index and a key value from a column that is present in all dataframes.

说我想合并df1,df2和df3并合并到索引和列'id'上:

Say I want to combine df1,df2 and df3 and merge on index and column 'id':

df1
            id value1
2015-05-01   1     13
2015-05-01   2     14
2015-05-02   1     15
2015-05-02   2     16

df2
            id  value2
2015-05-01   1       4
2015-05-02   1       5

df3
            id  value2
2015-05-01   2       7
2015-05-02   2       8

我想要的是获得一个看起来像这样的数据框

What I would like is to get a dataframe that looks like

df
            id   value1 value2
2015-05-01   1       13      4
2015-05-01   2       14      7
2015-05-02   1       15      5
2015-05-02   2       16      8

但是我在合并功能上苦苦挣扎.

but I struggle with the merge function.

推荐答案

如果您的DataFrame如下所示:

If your DataFrames look like this:

import datetime as DT
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'id':[1,2,1,2], 'value1':[13,14,15,16]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-1', '2015-5-2', '2015-5-2']))
df2 = pd.DataFrame({'id':[1,1], 'value2':[4,5]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-2']))
df3 = pd.DataFrame({'id':[2,2], 'value2':[7,8]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-2']))

您可以连接所有数据框:

you could concatenate all the DataFrames:

df = pd.concat([df1,df2,df3])
#             id  value1  value2
# 2015-05-01   1      13     NaN
# 2015-05-01   2      14     NaN
# 2015-05-02   1      15     NaN
# 2015-05-02   2      16     NaN
# 2015-05-01   1     NaN       4
# 2015-05-02   1     NaN       5
# 2015-05-01   2     NaN       7
# 2015-05-02   2     NaN       8

由于结果在日期和id上都对齐,因此很自然地将id设置为索引.然后,如果我们堆叠DataFrame,我们将得到以下系列:

Since the result is being aligned on both the date and the id, it's natural to set id as an index. Then if we stack the DataFrame we get this Series:

series = df.set_index(['id'], append=True).stack()
#             id        
# 2015-05-01  1   value1    13
#             2   value1    14
# 2015-05-02  1   value1    15
#             2   value1    16
# 2015-05-01  1   value2     4
# 2015-05-02  1   value2     5
# 2015-05-01  2   value2     7
# 2015-05-02  2   value2     8
# dtype: float64

现在,如果我们转过来拆开系列,则值将根据剩余的索引(日期和id:

Now if we turn around and unstack the Series, the values are aligned based on the remaining index -- the date and the id:

result = series.unstack()

收益

               value1  value2
           id                
2015-05-01 1       13       4
           2       14       7
2015-05-02 1       15       5
           2       16       8

请注意,unstack()要求其余索引是唯一的.那意味着 没有重复的(date, id)条目.如果存在重复的条目,则不清楚所需的输出是什么.解决该问题的一种方法是将dateid分组并汇总值.另一种选择是选择一个值,然后删除其他值.

Note that unstack() requires that the remaining index is unique. That means that there are no duplicate (date, id) entries. If there are duplicate entries, then its not clear what the desired output should be. One way to address the issue would be to group by the date and id and aggregate the values. Another option would be to pick one value and drop the others.

这篇关于 pandas :将数据框添加到数据框-匹配索引和列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆