需要处理具有非唯一多索引的连接数据帧 [英] Need to handle a concatenated dataframe with non-unique multi-index
本文介绍了需要处理具有非唯一多索引的连接数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这有效:
import pandas as pd
raw_data = {
'type_1': [1, 1],
'id_1': ['2', '3'],
'name_1': ['Alex', 'Amy']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'id_1', 'name_1'])
raw_datab = {
'type_2': [1, 1],
'id_2': ['4', '5'],
'name_2': ['Billy', 'Brian']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'id_2', 'name_2'])
dfs = [df_a.set_index(['type_1','id_1']),
df_b.set_index(['type_2','id_2'])]
df = pd.concat(dfs, axis=1)
print (df)
印刷品:
name_1 name_2
1 2 Amy NaN
3 Alex NaN
4 NaN Billy
5 NaN Brian
如果我更改以下内容,它将不起作用,因为 raw_data
中的多索引键是重复的:
If I change the following, it doesn't work as the multi-index key in raw_data
is a duplicate:
raw_data = {
'type_1': [1, 1],
'id_1': ['2', '2'], # <-- changed from 3 to 2
'name_1': ['Alex', 'Amy']}
以及以下内容:
raw_datab = {
'type_2': [1, 1],
'id_2': ['2', '5'], # <-- changed from 4 to 2
'name_2': ['Billy', 'Brian']}
因此,Alex
、Amy
和 Billy
都具有相同的多索引键 [1,2]
,所以 concat
失败:
As a result, both Alex
, Amy
and Billy
have the same multi-index key [1,2]
, so the concat
fails with:
无法处理非唯一的多索引!
cannot handle a non-unique multi-index!
但是重复的数据是有效的,无论如何我都需要连接它.这是我需要实现的结果(注意这应该是外连接,默认):
But the duplicate data is valid, and I need to concatenate it anyway. This is the result I need to achieve (note that this should be an outer join, the default):
name_1 name_2
1 2 Amy Billy
2 Alex Billy
5 NaN Brian
Pandas 怎么可能做到这一点?
How's this possible with Pandas?
推荐答案
将 axis=1
更改为 axis=0
(默认)
Change axis=1
to axis=0
(defualt)
df = pd.concat(dfs)
df
Out[52]:
name_1 name_2
type_1 id_1
1 2 Alex NaN
2 Amy NaN
4 NaN Billy
5 NaN Brian
根据您的评论..
df_a.merge(df_b,left_on=['type_1','id_1'],right_on=['type_2','id_2'],how='outer').set_index(['type_2','id_2']).drop(['type_1','id_1'],1)
Out[80]:
name_1 name_2
type_2 id_2
1 2 Alex Billy
2 Amy Billy
5 NaN Brian
这篇关于需要处理具有非唯一多索引的连接数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文