仅将一个键列复制到合并的DataFrame中 [英] Only copy one key-column into merged DataFrame

查看:110
本文介绍了仅将一个键列复制到合并的DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下数据框:

df1 = pd.DataFrame({'a': [0, 1, 2, 3], 'b': list('abcd')})
df2 = pd.DataFrame({'c': list('abcd'), 'd': 'Alex'})

在这种情况下,df1['b']df2['c']是键列.因此,在合并时:

In this instance, df1['b'] and df2['c'] are the key columns. So when merging:

df1.merge(df2, left_on='b', right_on='c')
   a  b  c     d
0  0  a  a  Alex
1  1  b  b  Alex
2  2  c  c  Alex
3  3  d  d  Alex

当我只需要一个时,最后在结果DataFrame中使用了两个关键列.我一直在使用:

I end up with both key columns in the resultant DataFrame when I only need one. I've been using:

df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

有没有办法只保留一个关键列?

Is there a way to only keep one key column?

推荐答案

一种方法是分别将bc设置为帧的索引,并使用join后跟reset_index: >

One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()

   b  a     d
0  a  0  Alex
1  b  1  Alex
2  c  2  Alex
3  d  3  Alex

在大型数据帧上,这将比merge/drop方法快,主要是因为drop很慢. @Bill的方法比我的建议快,而@ W-B& @PiRsquared可以轻松地超越其他建议:

This will be faster than the merge/drop method on large dataframes, mostly because drop is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:

import timeit

df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))

def index_method(df1 = df1, df2 = df2):
    return df1.set_index('b').join(df2.set_index('c')).reset_index()


def merge_method(df1 = df1, df2=df2):
    return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

def rename_method(df1 = df1, df2 = df2):
    return df1.rename({'b': 'c'}, axis=1).merge(df2)

def index_method2(df1 = df1, df2 = df2):
    return df1.join(df2.set_index('c'), on='b')

def assign_method(df1 = df1, df2 = df2):
    return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()

def map_method(df1 = df1, df2 = df2):
    return df1.assign(d=df1.b.map(dict(df2.values)))

>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382

这篇关于仅将一个键列复制到合并的DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆