pandas 中merge()和concat()之间的差异 [英] Difference(s) between merge() and concat() in pandas
问题描述
pd.DataFrame.merge()
和pd.concat()
之间的本质区别是什么?
What's the essential difference(s) between pd.DataFrame.merge()
and pd.concat()
?
到目前为止,这是我发现的,请评论一下我的理解是多么完整和准确:
So far, this is what I found, please comment on how complete and accurate my understanding is:
-
.merge()
只能使用列(加上行索引),并且在语义上适合于数据库样式的操作..concat()
可以与任一轴一起使用,仅使用索引,并提供添加分层索引的选项.
.merge()
can only use columns (plus row-indices) and it is semantically suitable for database-style operations..concat()
can be used with either axis, using only indices, and gives the option for adding a hierarchical index.
顺便说一句,这提供了以下冗余:两者都可以使用行索引合并两个数据帧.
Incidentally, this allows for the following redundancy: both can combine two dataframes using the rows indices.
pd.DataFrame.join()
仅提供了.merge()
(Pandas非常擅长解决数据分析中的各种用例.探索文档以找出执行特定任务的最佳方法可能有些艰巨.)
(Pandas is great at addressing a very wide spectrum of use cases in data analysis. It can be a bit daunting exploring the documentation to figure out what is the best way to perform a particular task. )
推荐答案
一个非常高的差异是merge()
用于基于公共列的值组合两个(或多个)数据帧(索引也可以是使用left_index=True
和/或right_index=True
),并且concat()
用于将一个(或多个)数据帧一个接一个地附加在另一个(或横向)上,具体取决于axis
选项是设置为0还是1 ).
A very high level difference is that merge()
is used to combine two (or more) dataframes on the basis of values of common columns (indices can also be used, use left_index=True
and/or right_index=True
), and concat()
is used to append one (or more) dataframes one below the other (or sideways, depending on whether the axis
option is set to 0 or 1).
join()
用于根据索引合并2个数据帧;代替使用带有选项left_index=True
的merge()
,我们可以使用join()
.
join()
is used to merge 2 dataframes on the basis of the index; instead of using merge()
with the option left_index=True
we can use join()
.
例如:
df1 = pd.DataFrame({'Key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df1:
Key data1
0 b 0
1 b 1
2 a 2
3 c 3
4 a 4
5 a 5
6 b 6
df2 = pd.DataFrame({'Key': ['a', 'b', 'd'], 'data2': range(3)})
df2:
Key data2
0 a 0
1 b 1
2 d 2
#Merge
# The 2 dataframes are merged on the basis of values in column "Key" as it is
# a common column in 2 dataframes
pd.merge(df1, df2)
Key data1 data2
0 b 0 1
1 b 1 1
2 b 6 1
3 a 2 0
4 a 4 0
5 a 5 0
#Concat
# df2 dataframe is appended at the bottom of df1
pd.concat([df1, df2])
Key data1 data2
0 b 0 NaN
1 b 1 NaN
2 a 2 NaN
3 c 3 NaN
4 a 4 NaN
5 a 5 NaN
6 b 6 NaN
0 a Nan 0
1 b Nan 1
2 d Nan 2
这篇关于 pandas 中merge()和concat()之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!