比较两个或三个数据框中的列值并合并 [英] Compare column values in two or three dataframe and merge

查看:43
本文介绍了比较两个或三个数据框中的列值并合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经检查了几个早期的问题,我有一些独特的问题.我有三个 excel 文件,并将它们加载到三个不同的数据框中.基本上我必须添加 excel_1 和 excel_2 的内容并将内容与 excel_3 进行比较

I have already checked few earlier questions and I have some what unique problem. I have three excel file and I load them into three different dataframe. Basically I have to add contents of excel_1 and excel_2 and compare the contents against excel_3

示例数据:(excel_1 sales Territory#1)

Example data: (excel_1 sales Territory#1)

  Name  Year    Item    sales_Amount1
  A1        1.2019  Badam   2
  A1        1.2019  Badam   10
  A1        1.2019  carrot  8
  A1        1.2019  carrot  10
  A2        1.2019  Badam   10
  A2        1.2019  Badam   20
  A3        2.2019  soap    3
  A1        2.2019  soap    1

示例数据:(excel_2 sales Territory#2)

Example data: (excel_2 sales Territory#2)

  Name  Year    Item    sales_Amount2
  A1        1.2019  Badam   60
  A1        1.2019  Badam   10
  A2        1.2019  Badam   40
  A2        1.2019  Badam   1
  A3        2.2019  soap    1
  A3        2.2019  soap    10
  A1        2.2019  soap    10

excel_3 target 也有类似的数据

excel_3 target also has similar data

  Name  Year    Item    target_Amount
  A1        1.2019  Badam   100
  A2        1.2019  Badam   30
  A1        1.2019  carrot  200
  A3        2.2019  soap    3

基本上,我必须将销售额 1 和 2 相加,并将结果与​​目标数据进行比较.我想得到一个带有列详细信息的单个 csv,如下所示.因此,我可以像我提到的那样进行计算.

Basically I have to add sales amount 1 and 2 and compare the results against target data. I would like to arrive a single csv with column details as below. Hence I can do the calculations as I mentioned.

  Name   Year    Item   sales_Amount1  Sales_Amount2  target_Amount
  A1     1.2019  Badam  12              70             100
  A1     1.2019  carrot 18              0              200
  A2     1.2019  Badam  30              41             30
  A1     2.2019  soap   1               10             0
  A3     2.1019  soap   3               11             3

df1 = pd.read_excel(r"excel_1.xlxs")
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})

df2 = pd.read_excel(r"excel_2.xlxs")
sum_sales1 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})

df3 = pd.read_excel(r"excel_3.xlxs")
sum_sales1 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})

基本上我将每个 csv 加载到一个数据帧中,然后 groupby 以获取每个项目的聚合总和,如上所示.现在将如上所示的所有三个数据帧与以下列进行比较和合并有点棘手

Basically I loaded each csv into one dataframe and then groupby to get the aggregated sum of each items as shown above. It's bit tricky now to compare and merge all three dataframe as shown above with the following columns

Name   Year    Item   sales_Amount1  Sales_Amount2  target_Amount

这种方法是否适合比较三个数据帧并将它们合并为一个,或者我应该转向数据透视表.在我继续之前选择哪一个有点令人困惑.谢谢.

Is this approach right towards comparing three dataframes and merge them as one or should I move towards pivot table. It's bit confusing which one to chose before I proceed. Thanks.

推荐答案

使用 concatDataFrame.fillna:

sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
sum_sales2 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
sum_sales3 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})

df = (pd.concat([sum_sales1, sum_sales2, sum_sales3],
        axis=1).fillna(0).astype(int).reset_index())
print (df)
  Name    Year    Item  sales_Amount1  sales_Amount2  target_Amount
0   A1  1.2019   Badam             12             70            100
1   A1  1.2019  carrot             18              0            200
2   A1  2.2019    soap              1             10              0
3   A2  1.2019   Badam             30             41             30
4   A3  2.2019    soap              3             11              3

如果最后一列是必要的聚合 - 然后使用列表理解:

If last column is necessary aggregate - then use list comprehension:

dfs = [df1, df2, df3]
dfs = [x.groupby(['Name','Year', 'Item']).agg({x.columns[-1]: 'sum'}) for x in dfs]

df = pd.concat(dfs, axis=1).fillna(0).astype(int).reset_index()
print (df)
  Name    Year    Item  sales_Amount1  sales_Amount2  target_Amount
0   A1  1.2019   Badam             12             70            100
1   A1  1.2019  carrot             18              0            200
2   A1  2.2019    soap              1             10              0
3   A2  1.2019   Badam             30             41             30
4   A3  2.2019    soap              3             11              3

这篇关于比较两个或三个数据框中的列值并合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆