如何在基于其他数据框的数据框中创建联接? [英] How to create a join in Dataframe based on the other dataframe?

查看:106
本文介绍了如何在基于其他数据框的数据框中创建联接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据框.一个包含学生批次详细信息,另一个包含分数.我想加入2个数据框.

I have 2 dataframes. One containing student batch details and another one with points. I want to join 2 dataframes.

Dataframe1包含

Dataframe1 contains

+-------+-------+-------+--+
|  s1   |  s2   |  s3   |  |
+-------+-------+-------+--+
| Stud1 | Stud2 | Stud3 |  |
| Stud2 | Stud4 | Stud1 |  |
| Stud1 | Stud3 | Stud4 |  |
+-------+-------+-------+--+

Dataframe2包含

Dataframe2 contains

+-------+-------+----------+--+
| Name  | Point | Category |  |
+-------+-------+----------+--+
| Stud1 |    90 | Good     |  |
| Stud2 |    80 | Average  |  |
| Stud3 |    95 | Good     |  |
| Stud4 |    55 | Poor     |  |
+-------+-------+----------+

我正在尝试将标记映射到每个学生的相同数据集中.

I am trying to map the mark in the same dataset for each student.

+-------+-------+-------+----+----+----+
| Stud1 | Stud2 | Stud3 | 90 | 80 | 95 |
| Stud2 | Stud4 | Stud1 | 80 | 55 | 90 |
| Stud1 | Stud3 | Stud4 | 90 | 95 | 55 |
+-------+-------+-------+----+----+----+

我尝试了下面的代码,但它正在将值一一替换.

I tried below code but it is replacing the values one by one.

s = df3['p1'].map(dfnamepoints.set_index('name')['points'])
df4 = df3.drop('p1', 1).assign(points = s)

推荐答案

如果df3中的所有值都存在于列Name中,则解决方案同样起作用:

Solution working same if all values from df3 exist in column Name:

s = dfnamepoints.set_index('Name')['Point']
df = df3.join(df3.replace(s).add_prefix('new_'))

或者:

df = df3.join(df3.apply(lambda x: x.map(s)).add_prefix('new_'))

或者:

df = df3.join(df3.applymap(s.get).add_prefix('new_'))

print (df)
      s1     s2     s3  new_s1  new_s2  new_s3
0  Stud1  Stud2  Stud3      90      80      95
1  Stud2  Stud4  Stud1      80      55      90
2  Stud1  Stud3  Stud4      90      95      55

如果不是,则输出是不同的-对于不存在的值(Stud1),得到NaN s:

If not, output is different - for not exist values (Stud1) get NaNs:

print (dfnamepoints)
    Name  Point Category
0  Stud2     80  Average
1  Stud3     95     Good
2  Stud4     55     Poor

df = df3.join(df3.applymap(s.get).add_prefix('new_'))
#or 
df = df3.join(df3.applymap(s.get).add_prefix('new_'))

print (df)
      s1     s2     s3  new_s1  new_s2  new_s3
0  Stud1  Stud2  Stud3     NaN      80    95.0
1  Stud2  Stud4  Stud1    80.0      55     NaN
2  Stud1  Stud3  Stud4     NaN      95    55.0

对于replace,请获取原始值:

df = df3.join(df3.replace(s).add_prefix('new_'))
print (df)
      s1     s2     s3 new_s1  new_s2 new_s3
0  Stud1  Stud2  Stud3  Stud1      80     95
1  Stud2  Stud4  Stud1     80      55  Stud1
2  Stud1  Stud3  Stud4  Stud1      95     55

这篇关于如何在基于其他数据框的数据框中创建联接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆