合并两个 Pandas 数据框(加入一个公共列) [英] Combine two pandas Data Frames (join on a common column)

查看:31
本文介绍了合并两个 Pandas 数据框(加入一个公共列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 2 个数据框:

restaurant_ids_dataframe

restaurant_ids_dataframe

Data columns (total 13 columns):
business_id      4503  non-null values
categories       4503  non-null values
city             4503  non-null values
full_address     4503  non-null values
latitude         4503  non-null values
longitude        4503  non-null values
name             4503  non-null values
neighborhoods    4503  non-null values
open             4503  non-null values
review_count     4503  non-null values
stars            4503  non-null values
state            4503  non-null values
type             4503  non-null values
dtypes: bool(1), float64(3), int64(1), object(8)`

restaurant_review_frame

restaurant_review_frame

Int64Index: 158430 entries, 0 to 229905
Data columns (total 8 columns):
business_id    158430  non-null values
date           158430  non-null values
review_id      158430  non-null values
stars          158430  non-null values
text           158430  non-null values
type           158430  non-null values
user_id        158430  non-null values
votes          158430  non-null values
dtypes: int64(1), object(7)

我想使用 pandas 中的 DataFrame.join() 命令将这两个 DataFrame 合并为一个数据帧.

I would like to join these two DataFrames to make them into a single dataframe using the DataFrame.join() command in pandas.

我尝试了以下代码行:

#the following line of code creates a left join of restaurant_ids_frame and   restaurant_review_frame on the column 'business_id'
restaurant_review_frame.join(other=restaurant_ids_dataframe,on='business_id',how='left')

但是当我尝试这个时,我收到以下错误:

But when I try this I get the following error:

Exception: columns overlap: Index([business_id, stars, type], dtype=object)

我对 Pandas 很陌生,不知道就执行 join 语句而言我做错了什么.

I am very new to pandas and have no clue what I am doing wrong as far as executing the join statement is concerned.

任何帮助将不胜感激.

推荐答案

您可以使用 merge 将两个数据框合并为一个:

You can use merge to combine two dataframes into one:

import pandas as pd
pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')

其中 on 指定要加入的两个数据帧中都存在的字段名称,以及方式定义其内部/外部/左/右连接,外部使用来自两个框架的键的联合(SQL:完全外部连接)".由于您在两个数据框中都有star"列,因此默认情况下这将在组合数据框中创建两列 star_x 和 star_y.正如@DanAllan 提到的 join 方法,您可以通过将其作为 kwarg 传递来修改合并的后缀.默认为 suffixes=('_x', '_y').如果你想做star_restaurant_idstar_restaurant_review之类的事情,你可以这样做:

where on specifies field name that exists in both dataframes to join on, and how defines whether its inner/outer/left/right join, with outer using 'union of keys from both frames (SQL: full outer join).' Since you have 'star' column in both dataframes, this by default will create two columns star_x and star_y in the combined dataframe. As @DanAllan mentioned for the join method, you can modify the suffixes for merge by passing it as a kwarg. Default is suffixes=('_x', '_y'). if you wanted to do something like star_restaurant_id and star_restaurant_review, you can do:

 pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer', suffixes=('_restaurant_id', '_restaurant_review'))

参数在这个链接中有详细说明.

这篇关于合并两个 Pandas 数据框(加入一个公共列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆