pandas 中merge()和concat()之间的差异 [英] Difference(s) between merge() and concat() in pandas

查看:439
本文介绍了 pandas 中merge()和concat()之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pd.DataFrame.merge()pd.concat()之间的本质区别是什么?

What's the essential difference(s) between pd.DataFrame.merge() and pd.concat()?

到目前为止,这是我发现的,请评论一下我的理解是多么完整和准确:

So far, this is what I found, please comment on how complete and accurate my understanding is:

  • .merge()只能使用列(加上行索引),并且在语义上适合于数据库样式的操作. .concat()可以与任一轴一起使用,仅使用索引,并提供添加分层索引的选项.

  • .merge() can only use columns (plus row-indices) and it is semantically suitable for database-style operations. .concat() can be used with either axis, using only indices, and gives the option for adding a hierarchical index.

顺便说一句,这提供了以下冗余:两者都可以使用行索引合并两个数据帧.

Incidentally, this allows for the following redundancy: both can combine two dataframes using the rows indices.

pd.DataFrame.join()仅提供了.merge()

(Pandas非常擅长解决数据分析中的各种用例.探索文档以找出执行特定任务的最佳方法可能有些艰巨.)

(Pandas is great at addressing a very wide spectrum of use cases in data analysis. It can be a bit daunting exploring the documentation to figure out what is the best way to perform a particular task. )

推荐答案

一个非常高的差异是merge()用于基于公共列的值组合两个(或多个)数据帧(索引也可以是使用left_index=True和/或right_index=True),并且concat()用于将一个(或多个)数据帧一个接一个地附加在另一个(或横向)上,具体取决于axis选项是设置为0还是1 ).

A very high level difference is that merge() is used to combine two (or more) dataframes on the basis of values of common columns (indices can also be used, use left_index=True and/or right_index=True), and concat() is used to append one (or more) dataframes one below the other (or sideways, depending on whether the axis option is set to 0 or 1).

join()用于根据索引合并2个数据帧;代替使用带有选项left_index=Truemerge(),我们可以使用join().

join() is used to merge 2 dataframes on the basis of the index; instead of using merge() with the option left_index=True we can use join().

例如:

df1 = pd.DataFrame({'Key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})

df1:
   Key  data1
0   b   0
1   b   1
2   a   2
3   c   3
4   a   4
5   a   5
6   b   6

df2 = pd.DataFrame({'Key': ['a', 'b', 'd'], 'data2': range(3)})

df2:
    Key data2
0   a   0
1   b   1
2   d   2

#Merge
# The 2 dataframes are merged on the basis of values in column "Key" as it is 
# a common column in 2 dataframes

pd.merge(df1, df2)

   Key data1 data2
0   b    0    1
1   b    1    1
2   b    6    1
3   a    2    0
4   a    4    0
5   a    5    0

#Concat
# df2 dataframe is appended at the bottom of df1 

pd.concat([df1, df2])

   Key data1 data2
0   b   0     NaN
1   b   1     NaN
2   a   2     NaN
3   c   3     NaN
4   a   4     NaN
5   a   5     NaN
6   b   6     NaN
0   a   Nan   0
1   b   Nan   1
2   d   Nan   2

这篇关于 pandas 中merge()和concat()之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆