根据 Pandas 中的常见列值合并两个数据框 [英] Merge two data frames based on common column values in Pandas

查看:32
本文介绍了根据 Pandas 中的常见列值合并两个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从具有共同列值的两个数据帧中获取合并数据帧,以便只有那些行使合并数据帧在特定列中具有共同值.

How to get merged data frame from two data frames having common column value such that only those rows make merged data frame having common value in a particular column.

我有 5000 行 df1 作为格式:-

I have 5000 rows of df1 as format : -

    director_name   actor_1_name    actor_2_name    actor_3_name    movie_title
0   James Cameron   CCH Pounder Joel David Moore    Wes Studi     Avatar
1   Gore Verbinski  Johnny Depp Orlando Bloom   Jack Davenport   Pirates 
    of the Caribbean: At World's End
2   Sam Mendes   Christoph Waltz    Rory Kinnear    Stephanie Sigman Spectre

和 10000 行 df2 as

and 10000 rows of df2 as

movieId                   genres                        movie_title
    1       Adventure|Animation|Children|Comedy|Fantasy   Toy Story
    2       Adventure|Children|Fantasy                    Jumanji
    3       Comedy|Romance                             Grumpier Old Men
    4       Comedy|Drama|Romance                      Waiting to Exhale

一个共同的列 'movie_title' 具有共同的值,并且基于它们,我想获得 'movie_title' 相同的所有行.要删除的其他行.

A common column 'movie_title' have common values and based on them, I want to get all rows where 'movie_title' is same. Other rows to be deleted.

任何帮助/建议将不胜感激.

Any help/suggestion would be appreciated.

注意:我已经试过了

pd.merge(dfinal, df1, on='movie_title')

输出就像一行

director_name   actor_1_name    actor_2_name    actor_3_name    movie_title movieId title   genres

关于如何 ="outer"/"left", "right",尽管确实存在许多常见的列,但我尝试了所有并在删除 NaN 后没有得到任何行.

and on how ="outer"/"left", "right", I tried all and didn't get any row after dropping NaN although many common coloumn do exist.

推荐答案

我们可以通过多种方式合并两个数据框.python 中最常见的方法是在 Pandas 中使用合并操作.

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

对于基于不同数据框的列进行合并,您可以指定左右公共列名称,特别是在同一列的两个不同名称有歧义的情况下,例如 - 'movie_title' 作为 '电影名称'.

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

如果你想更具体,你可以阅读pandas的文档merge 操作.

If you want to be even more specific, you may read the documentation of pandas merge operation.

这篇关于根据 Pandas 中的常见列值合并两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆