pandas -2个数据框,将df1的Index列添加到第二列的df2 [英] Pandas - 2 dataframes, add Index column of df1 to df2 on second column
问题描述
我有2个数据框:
df1 (sample, has more columns):
+---+----------------+--------------+-----------+
| | Region | Placement ID | Units |
+---+----------------+--------------+-----------+
| 0 | Western Europe | 1.10872E+13 | 367628.76 |
| 1 | Western Europe | 1.10872E+13 | 367628.76 |
| 2 | Western Europe | 1.10872E+13 | 74604.63 |
+---+----------------+--------------+-----------+
df2 (sample, has more columns:
+-----------+----------------+--------------+
| Creatives | Publisher Name | Placement ID |
+-----------+----------------+--------------+
| Temenos | Quantcast | 1.10872E+13 |
| Temenos | Quantcast | 1.10872E+13 |
| Temenos | Quantcast | 1.10872E+13 |
+-----------+----------------+--------------+
我想做的是根据位置ID在数据框2中添加一个额外的列,其中数据框1的索引列.
What I would like to do is to add an extra column in dataframe 2 with the index column of dataframe 1 based on Placement ID.
数据框1或2中的某些展示位置ID字段可能为空,或者具有错误的值,以防不匹配或发现错误,那么我想添加一个Missing或Error值,例如N/A ,缺少或留空
Some placements Id fields in dataframe 1 or 2 might be empty, or have erroneous value, in case there is no match, or an error is found, then I would like to add a Missing or Error value such as N/A, Missing, or leave blank
推荐答案
IIUC,您需要 merge
,但是重复项存在问题,因此请先通过
IIUC you need merge
, but there is problem with duplicates, so first remove them by drop_duplicates
and then select columns one for adding and another for join (Placement ID
):
print (pd.merge(df2,
df1.drop_duplicates('Placement ID')[['Units', 'Placement ID']],
how='left',
on='Placement ID'))
Creatives Publisher Name Placement ID Units
0 Temenos Quantcast 1.108720e+13 367628.76
1 Temenos Quantcast 1.108720e+13 367628.76
2 Temenos Quantcast 1.108720e+13 367628.76
如果需要添加索引,则需要 reset_index
:
If need add index need reset_index
:
print (pd.merge(df2,
df1.drop_duplicates('Placement ID')
.reset_index()[['level_0','Placement ID']],
how='left',
on='Placement ID'))
Creatives Publisher Name Placement ID level_0
0 Temenos Quantcast 1.108720e+13 0
1 Temenos Quantcast 1.108720e+13 0
2 Temenos Quantcast 1.108720e+13 0
需要删除重复项,因为merge
通过连接的键有多行-在df2
中有3个相同的值1.108720e+13
和在df1
中有3行,因此得到3 x 3行,如:
Need drop duplicates, because merge
multiple rows by joined keys - there are 3 same values 1.108720e+13
in df2
and 3 rows in df1
, so get 3 x 3 rows like:
print (pd.merge(df2,
df1.reset_index()[['level_0', 'Placement ID']],
how='left',
on='Placement ID'))
Creatives Publisher Name Placement ID level_0
0 Temenos Quantcast 1.108720e+13 0
1 Temenos Quantcast 1.108720e+13 1
2 Temenos Quantcast 1.108720e+13 2
3 Temenos Quantcast 1.108720e+13 0
4 Temenos Quantcast 1.108720e+13 1
5 Temenos Quantcast 1.108720e+13 2
6 Temenos Quantcast 1.108720e+13 0
7 Temenos Quantcast 1.108720e+13 1
8 Temenos Quantcast 1.108720e+13 2
这篇关于 pandas -2个数据框,将df1的Index列添加到第二列的df2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!