根据 Pandas 中的常见列值合并两个数据框 [英] Merge two data frames based on common column values in Pandas
问题描述
如何从具有共同列值的两个数据帧中获取合并数据帧,以便只有那些行使合并数据帧在特定列中具有共同值.
How to get merged data frame from two data frames having common column value such that only those rows make merged data frame having common value in a particular column.
我有 5000 行 df1
作为格式:-
I have 5000 rows of df1
as format : -
director_name actor_1_name actor_2_name actor_3_name movie_title
0 James Cameron CCH Pounder Joel David Moore Wes Studi Avatar
1 Gore Verbinski Johnny Depp Orlando Bloom Jack Davenport Pirates
of the Caribbean: At World's End
2 Sam Mendes Christoph Waltz Rory Kinnear Stephanie Sigman Spectre
和 10000 行 df2
as
and 10000 rows of df2
as
movieId genres movie_title
1 Adventure|Animation|Children|Comedy|Fantasy Toy Story
2 Adventure|Children|Fantasy Jumanji
3 Comedy|Romance Grumpier Old Men
4 Comedy|Drama|Romance Waiting to Exhale
一个共同的列 'movie_title' 具有共同的值,并且基于它们,我想获得 'movie_title' 相同的所有行.要删除的其他行.
A common column 'movie_title' have common values and based on them, I want to get all rows where 'movie_title' is same. Other rows to be deleted.
任何帮助/建议将不胜感激.
Any help/suggestion would be appreciated.
注意:我已经试过了
pd.merge(dfinal, df1, on='movie_title')
输出就像一行
director_name actor_1_name actor_2_name actor_3_name movie_title movieId title genres
关于如何 ="outer"/"left", "right",尽管确实存在许多常见的列,但我尝试了所有并在删除 NaN 后没有得到任何行.
and on how ="outer"/"left", "right", I tried all and didn't get any row after dropping NaN although many common coloumn do exist.
推荐答案
我们可以通过多种方式合并两个数据框.python 中最常见的方法是在 Pandas 中使用合并操作.
We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.
import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')
对于基于不同数据框的列进行合并,您可以指定左右公共列名称,特别是在同一列的两个不同名称有歧义的情况下,例如 - 'movie_title'
作为 '电影名称'
.
For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title'
as 'movie_name'
.
dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')
如果你想更具体,你可以阅读pandas的文档merge
操作.
If you want to be even more specific, you may read the documentation of pandas merge
operation.
这篇关于根据 Pandas 中的常见列值合并两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!