大 pandas :合并(内部联接)数据框的行数多于原始行 [英] pandas: merged (inner join) data frame has more rows than the original ones

查看：63 发布时间：2020/5/23 22:54:57 python python-3.x pandas dataframe

本文介绍了大 pandas :合并(内部联接)数据框的行数多于原始行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Jupyter Notebook上使用python 3.4，试图合并两个数据框，如下所示:

I am using python 3.4 on Jupyter Notebook, trying to merge two data frame like below:

df_A.shape
(204479, 2)

df_B.shape
(178, 3)

new_df = pd.merge(df_A, df_B,  how='inner', on='my_icon_number')
new_df.shape
(266788, 4)

我认为上面合并的new_df应该比df_A少一些行，因为合并就像一个内部联接.但是，为什么这里的new_df实际上比df_A具有更多的行?

I thought the new_df merged above should have few rows than df_A since merge is like an inner join. But why new_df here actually has more rows than df_A?

这是我真正想要的:

我的df_A就像:

 id           my_icon_number
-----------------------------
 A1             123             
 B1             234
 C1             123
 D1             235
 E1             235
 F1             400

和我的df_B就像:

my_icon_number    color      size
-------------------------------------
  123              blue      small
  234              red       large 
  235              yellow    medium

然后我想成为new_df:

 id           my_icon_number     color       size
--------------------------------------------------
 A1             123              blue        small
 B1             234              red         large
 C1             123              blue        small
 D1             235              yellow      medium
 E1             235              yellow      medium

我真的不想删除df_A中my_icon_number的重复项.知道我在这里错过了什么吗?

I don't really want to remove duplicates of my_icon_number in df_A. Any idea what I missed here?

示例

在此示例中，唯一的共同值是4，但我在每个数据集中拥有3次.这意味着我应该在结果合并中获得9行，每个组合一个.

Example

In this example, the only value in common is 4 but I have it 3 times in each data set. That means I should get 9 total rows in the resulting merge, one for every combination.

df_A = pd.DataFrame(dict(my_icon_number=[1, 2, 3, 4, 4, 4], other_column1=range(6)))
df_B = pd.DataFrame(dict(my_icon_number=[4, 4, 4, 5, 6, 7], other_column2=range(6)))

pd.merge(df_A, df_B,  how='inner', on='my_icon_number')

   my_icon_number  other_column1  other_column2
0               4              3              0
1               4              3              1
2               4              3              2
3               4              4              0
4               4              4              1
5               4              4              2
6               4              5              0
7               4              5              1
8               4              5              2

这篇关于大 pandas :合并(内部联接)数据框的行数多于原始行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

大 pandas :合并(内部联接)数据框的行数多于原始行 [英] pandas: merged (inner join) data frame has more rows than the original ones

问题描述

推荐答案

示例

Example

相关文章

Python最新文章

热门教程

热门工具

登录关闭

大 pandas :合并(内部联接)数据框的行数多于原始行 [英] pandas: merged (inner join) data frame has more rows than the original ones

问题描述

推荐答案

示例

Example

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭