pandas -2个数据框,将df1的Index列添加到第二列的df2 [英] Pandas - 2 dataframes, add Index column of df1 to df2 on second column

查看:234
本文介绍了 pandas -2个数据框,将df1的Index列添加到第二列的df2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据框:

df1 (sample, has more columns):

+---+----------------+--------------+-----------+
|   |     Region     | Placement ID |   Units   |
+---+----------------+--------------+-----------+
| 0 | Western Europe | 1.10872E+13  | 367628.76 |
| 1 | Western Europe | 1.10872E+13  | 367628.76 |
| 2 | Western Europe | 1.10872E+13  | 74604.63  |
+---+----------------+--------------+-----------+

df2 (sample, has more columns:

+-----------+----------------+--------------+
| Creatives | Publisher Name | Placement ID |
+-----------+----------------+--------------+
| Temenos   | Quantcast      | 1.10872E+13  |
| Temenos   | Quantcast      | 1.10872E+13  |
| Temenos   | Quantcast      | 1.10872E+13  |
+-----------+----------------+--------------+

我想做的是根据位置ID在数据框2中添加一个额外的列,其中数据框1的索引列.

What I would like to do is to add an extra column in dataframe 2 with the index column of dataframe 1 based on Placement ID.

数据框1或2中的某些展示位置ID字段可能为空,或者具有错误的值,以防不匹配或发现错误,那么我想添加一个Missing或Error值,例如N/A ,缺少或留空

Some placements Id fields in dataframe 1 or 2 might be empty, or have erroneous value, in case there is no match, or an error is found, then I would like to add a Missing or Error value such as N/A, Missing, or leave blank

推荐答案

IIUC,您需要 merge ,但是重复项存在问题,因此请先通过

IIUC you need merge, but there is problem with duplicates, so first remove them by drop_duplicates and then select columns one for adding and another for join (Placement ID):

print (pd.merge(df2, 
                df1.drop_duplicates('Placement ID')[['Units', 'Placement ID']], 
                how='left', 
                on='Placement ID'))


  Creatives Publisher Name  Placement ID      Units
0   Temenos      Quantcast  1.108720e+13  367628.76
1   Temenos      Quantcast  1.108720e+13  367628.76
2   Temenos      Quantcast  1.108720e+13  367628.76

如果需要添加索引,则需要 reset_index :

If need add index need reset_index:

print (pd.merge(df2, 
                df1.drop_duplicates('Placement ID')
                   .reset_index()[['level_0','Placement ID']], 
                how='left', 
                on='Placement ID'))
  Creatives Publisher Name  Placement ID  level_0
0   Temenos      Quantcast  1.108720e+13        0
1   Temenos      Quantcast  1.108720e+13        0
2   Temenos      Quantcast  1.108720e+13        0

需要删除重复项,因为merge通过连接的键有多行-在df2中有3个相同的值1.108720e+13和在df1中有3行,因此得到3 x 3行,如:

Need drop duplicates, because merge multiple rows by joined keys - there are 3 same values 1.108720e+13 in df2 and 3 rows in df1, so get 3 x 3 rows like:

print (pd.merge(df2, 
                df1.reset_index()[['level_0', 'Placement ID']], 
                how='left', 
                on='Placement ID'))

  Creatives Publisher Name  Placement ID  level_0
0   Temenos      Quantcast  1.108720e+13        0
1   Temenos      Quantcast  1.108720e+13        1
2   Temenos      Quantcast  1.108720e+13        2
3   Temenos      Quantcast  1.108720e+13        0
4   Temenos      Quantcast  1.108720e+13        1
5   Temenos      Quantcast  1.108720e+13        2
6   Temenos      Quantcast  1.108720e+13        0
7   Temenos      Quantcast  1.108720e+13        1
8   Temenos      Quantcast  1.108720e+13        2

这篇关于 pandas -2个数据框,将df1的Index列添加到第二列的df2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆