如何在基于其他数据框的数据框中创建联接? [英] How to create a join in Dataframe based on the other dataframe?
本文介绍了如何在基于其他数据框的数据框中创建联接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有2个数据框.一个包含学生批次详细信息,另一个包含分数.我想加入2个数据框.
I have 2 dataframes. One containing student batch details and another one with points. I want to join 2 dataframes.
Dataframe1包含
Dataframe1 contains
+-------+-------+-------+--+
| s1 | s2 | s3 | |
+-------+-------+-------+--+
| Stud1 | Stud2 | Stud3 | |
| Stud2 | Stud4 | Stud1 | |
| Stud1 | Stud3 | Stud4 | |
+-------+-------+-------+--+
Dataframe2包含
Dataframe2 contains
+-------+-------+----------+--+
| Name | Point | Category | |
+-------+-------+----------+--+
| Stud1 | 90 | Good | |
| Stud2 | 80 | Average | |
| Stud3 | 95 | Good | |
| Stud4 | 55 | Poor | |
+-------+-------+----------+
我正在尝试将标记映射到每个学生的相同数据集中.
I am trying to map the mark in the same dataset for each student.
+-------+-------+-------+----+----+----+
| Stud1 | Stud2 | Stud3 | 90 | 80 | 95 |
| Stud2 | Stud4 | Stud1 | 80 | 55 | 90 |
| Stud1 | Stud3 | Stud4 | 90 | 95 | 55 |
+-------+-------+-------+----+----+----+
我尝试了下面的代码,但它正在将值一一替换.
I tried below code but it is replacing the values one by one.
s = df3['p1'].map(dfnamepoints.set_index('name')['points'])
df4 = df3.drop('p1', 1).assign(points = s)
推荐答案
如果df3
中的所有值都存在于列Name
中,则解决方案同样起作用:
Solution working same if all values from df3
exist in column Name
:
s = dfnamepoints.set_index('Name')['Point']
df = df3.join(df3.replace(s).add_prefix('new_'))
或者:
df = df3.join(df3.apply(lambda x: x.map(s)).add_prefix('new_'))
或者:
df = df3.join(df3.applymap(s.get).add_prefix('new_'))
print (df)
s1 s2 s3 new_s1 new_s2 new_s3
0 Stud1 Stud2 Stud3 90 80 95
1 Stud2 Stud4 Stud1 80 55 90
2 Stud1 Stud3 Stud4 90 95 55
如果不是,则输出是不同的-对于不存在的值(Stud1
),得到NaN
s:
If not, output is different - for not exist values (Stud1
) get NaN
s:
print (dfnamepoints)
Name Point Category
0 Stud2 80 Average
1 Stud3 95 Good
2 Stud4 55 Poor
df = df3.join(df3.applymap(s.get).add_prefix('new_'))
#or
df = df3.join(df3.applymap(s.get).add_prefix('new_'))
print (df)
s1 s2 s3 new_s1 new_s2 new_s3
0 Stud1 Stud2 Stud3 NaN 80 95.0
1 Stud2 Stud4 Stud1 80.0 55 NaN
2 Stud1 Stud3 Stud4 NaN 95 55.0
对于replace
,请获取原始值:
df = df3.join(df3.replace(s).add_prefix('new_'))
print (df)
s1 s2 s3 new_s1 new_s2 new_s3
0 Stud1 Stud2 Stud3 Stud1 80 95
1 Stud2 Stud4 Stud1 80 55 Stud1
2 Stud1 Stud3 Stud4 Stud1 95 55
这篇关于如何在基于其他数据框的数据框中创建联接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文