pandas 在重复的键上左联接,但没有增加列数 [英] Pandas left join on duplicate keys but without increasing the number of columns

查看:48
本文介绍了 pandas 在重复的键上左联接,但没有增加列数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将在python中导入的两个不同的数据帧与大熊猫结合起来.它们是我所做的一些眼动追踪的结果.但是其中之一包含用户凝视的类和方法,这意味着对于dataframe1的每一行,dataframe2都有一个额外的内容.现在这并不是在每一行中都发生,所以我不能只是复制行,而是我想的是,每当dataframe2的索引具有两个相同的索引时,就添加另一行.有点像这样:

I'm trying to combine two different dataframes I've imported in python with pandas. They are the results of some eye-tracking I've done. One of them however contains both the class and method the user has gazed upon, meaning that for every row dataframe1 has dataframe2 has an extra one. Now this doesn't happen in every row so I can't just duplicate the rows, but what I was thinking was to add another row every time the index of dataframe2 had two of the same indices. Kinda like this:

dataframe1 = pd.DataFrame({'index':[1,2,3],'a':['asd','fgh','qwe'],'b':['dsa','hgf','ewq'],'c':['sad','gfh','wqe']})
dataframe1=dataframe1[['index','a','b','c']]
dataframe1
   index    a    b    c
0      1  asd  dsa  sad
1      2  fgh  hgf  gfh
2      3  qwe  ewq  wqe

dataframe2 = pd.DataFrame({'index':[1,1,2,3,3],'d':['zxc','cxz','xzc','zxc','xcz']})
dataframe2=dataframe2[['index','d']]
dataframe2
   index    d
0      1  zxc
1      1  cxz
2      2  xzc
3      3  zxc
4      3  xcz

预期结果:

index, a, b, c, d
1, asd, dsa, sad, zxc
1, nan, nan, nan, cxz
2, fgh, hgf, gfh, xzc
3, qwe, ewq, wqe, zxc
3, nan, nan, nan, xcz

有内置函数要使用吗?这些值也可以只是具有相同索引的前一行的值.

Any built in functions to use? The values can also just be the values of the previous line with the same index.

推荐答案

pd.merge与附加的累加列一起使用:

Use pd.merge with an additional cumcounted column:

u = df2.assign(cnt=df2.groupby('index').cumcount())
v = df.assign(cnt=df.groupby('index').cumcount())

u.merge(v, on=['index', 'cnt'], how='left').drop('cnt', 1)

   index    d    a    b    c
0      1  zxc  asd  dsa  sad
1      1  cxz  NaN  NaN  NaN
2      2  xzc  fgh  hgf  gfh
3      3  zxc  qwe  ewq  wqe
4      3  xcz  NaN  NaN  NaN


详细信息

我们在索引"中引入重复值的累积计数.

We introduce cumulative counts for the duplicate values in "index".

u = df2.assign(cnt=df2.groupby('index').cumcount())
u
   index    d  cnt
0      1  zxc    0
1      1  cxz    1
2      2  xzc    0
3      3  zxc    0
4      3  xcz    1

v = df.assign(cnt=df.groupby('index').cumcount())
v
   index    a    b    c  cnt
0      1  asd  dsa  sad    0
1      2  fgh  hgf  gfh    0
2      3  qwe  ewq  wqe    0

然后我们在"index"和"cnt"上强制使用LEFT JOIN wrt u.这样,NaN被引入结果中:

We then force a LEFT JOIN wrt u on "index" and "cnt". This way, NaNs are introduced int the result:

u.merge(v, on=['index', 'cnt'], how='left')

   index    d  cnt    a    b    c
0      1  zxc    0  asd  dsa  sad
1      1  cxz    1  NaN  NaN  NaN
2      2  xzc    0  fgh  hgf  gfh
3      3  zxc    0  qwe  ewq  wqe
4      3  xcz    1  NaN  NaN  NaN

最后一步是删除临时的"cnt"列.

The last step is to delete the temporary "cnt" column.

这篇关于 pandas 在重复的键上左联接,但没有增加列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆