pandas 加入vs添加列 [英] Pandas join vs add column

查看:56
本文介绍了 pandas 加入vs添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据帧(df1df2),它们具有相同的MultiIndex. df1具有列A,df2具有列B.

I have 2 dataframes (df1 and df2) with the same MultiIndex. df1 has column A, df2 has column B.

我发现了联接"这些数据框的两种方式:

I found 2 ways of 'joining' these dataframes:

df_joined = df1.join(df2, how='inner')

df1['B'] = df2['B']

第一个选项需要更长的时间.为什么? 选项2是否不查看索引,而只是附加"右侧的列?

First option takes much longer. Why? Does option 2 not look at indexes and just 'attaches' the column to the right?

随后运行此命令将返回True,因此最终结果看起来是相同的,但这也许是因为df1df2中的索引也处于相同的顺序:

Running this afterwards returns True, so the end result is the same it seems, but perhaps this is because the indexes in df1 and df2 are also in the same order:

df_joined.equals(df1)

在索引相同的情况下,是否有任何更快的方法来连接数据框?

Is there any faster way to join the dataframes knowing the indexes are the same?

推荐答案

如果索引对齐,没有比df1['B'] = df2['B']更快的方法了.

There is no faster way than df1['B'] = df2['B'] if indices are aligned.

pandas中已经很好地优化了将一个系列分配给另一个系列的操作.

Assigning a series to another series is already well optimised in pandas.

join比分配花费更长的时间,因为它显式地排列df1.indexdf2.index,这很昂贵.不假定索引的顺序一致.根据 pd.DataFrame.join文档 ,如果未指定任何列,则join将出现在数据框的相应索引上.

join takes longer than assignment as it explicitly lines up df1.index and df2.index, which is expensive. It is not assumed that indices are in consistent order. As per pd.DataFrame.join documentation, if no column is specified the join will take place on the dataframes' respective indices.

如果您发现这是工作流程中的瓶颈,我会感到惊讶.如果是这样,那么我建议您使用numpy数组并完全避免使用pandas.

I would be surprised if you find this is a bottleneck in your workflow. If it is, then I suggest you drop down to numpy arrays and avoid pandas altogether.

这篇关于 pandas 加入vs添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆