在 pandas 中合并数据框 [英] merging data frames in pandas

查看:76
本文介绍了在 pandas 中合并数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pandas.merge 左右两边的行为不同!!!对于左侧,如果我们将 left_on 和 left_index 一起使用,它会显示错误,但右侧也可以使用!!!

pandas.merge acts differently for the left and right sides!!! For the left side if we use left_on and left_index together it shows an error, but the same for the right side works!!!

代码:

import pandas as pd
import numpy as np
right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])
left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})
pd.merge(left,right,right_index=True,left_index=True,right_on='event1')#it works and returns an empty table which is expected
pd.merge(left,right,left_index=True,right_index=True,left_on='key1')# it makes error !!!

推荐答案

您遇到了一些问题.首先,您的合并语句构造不正确.您不应同时使用 left_onleft_indexright_onright_index.您应该只使用一个左选项和一个右选项.

You have a few issues going on. First your merge statements are not constructed correctly. You shouldn't be using both a left_on and left_index or right_on and right_index at the same time. You should use only one left option and one right option.

您在第二个语句中出错的原因是索引级别不匹配.在左合并中,左索引是一个级别,当您同时指定 right_index=Trueright_on='event1' 时,right_on 属性优先.由于两者都是单级整数,所以没有问题.我应该指出,如果构造正确,合并 (pd.merge(left, right, left_index=True, right_on='event1', how='left')) 不会产生空数据帧...见下面的代码.

The reason you get an error in your second statement is because the index levels do not match. In your left merge, the left index is a single level, and you while you specify both right_index=True and right_on='event1', the right_on attribute is taking precedence. Since both are single level integers, there is no problem. I should point out that the merge, if constructed correctly, (pd.merge(left, right, left_index=True, right_on='event1', how='left')) does not produce an empty DataFrame... See code below.

在右侧合并中,您指定使用带有 right_index=True 的右侧索引,并且 left_on 优先于 left_index=True.这里的问题是正确的索引是 2 级,而您的key1"字段是单级字符串.

In your right merge, you specify using the right index with right_index=True and left_on takes precedence over left_index=True. The issue here is that the right index is 2 levels, where as your 'key1` field is a single level string.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])

In [4]: left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})

In [5]: left
Out[5]:
   data    key1  key2
0     0    Ohio  2000
1     1    Ohio  2001
2     2    Ohio  2002
3     3  Nevada  2001
4     4  Nevada  2002

In [6]: right
Out[6]:
             event1  event2
Nevada 2001       0       1
       2000       2       3
Ohio   2000       4       5
       2000       6       7
       2001       8       9
       2002      10      11

In [5]: left_merge = left.merge(right, left_index=True, right_on='event1', how='left')

In [7]: left_merge
Out[7]:
             data    key1  key2  event1  event2
Nevada 2001     0    Ohio  2000       0       1
Ohio   2002     1    Ohio  2001       1     NaN
Nevada 2000     2    Ohio  2002       2       3
Ohio   2002     3  Nevada  2001       3     NaN
       2000     4  Nevada  2002       4       5

这篇关于在 pandas 中合并数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆