除了计算之外,还使用现有数据框选择性地构建新数据框 [英] selective building of new dataframe with existing dataframes in addition to calculation

查看:50
本文介绍了除了计算之外,还使用现有数据框选择性地构建新数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

填写下面的 Pandas 代码以创建一个新的 DataFrame,customer_spend,其中包含按以下顺序排列的以下列:customer_id、name 和 total_spend.total_spend 是一个新列,包含特定客户下的所有订单的成本总和.

Fill in the Pandas code below to create a new DataFrame, customer_spend, that contains the following columns in this order: customer_id, name, and total_spend. total_spend is a new column containing the sum of the cost of all the orders that a particular customer placed.

我正在学习与 Python Pandas 相关的在线课程.正如我上面写的,这段代码的目标是创建一个名为customer_spend"的新数据框,其中包含 customer_id、name 和 total_spend 列.

I'm doing an online course related to Python Pandas. As I wrote above, the goal of this code is making a new dataframe called 'customer_spend', with columns of customer_id, name, and total_spend.

我遇到的问题是仅使用两个不同的现有数据帧的一部分构建数据帧.我尝试过合并,但它需要现有数据帧的每一列.此外,我无法将列重命名为total_spend".

What I'm having a trouble is that building a dataframe with only portion of two different, existing dataframes. I tried merge but it takes every column of existing dataframes. In addition, I'm having difficulty with re-naming the column to 'total_spend'.

import pandas as pd
import numpy as np

customers = pd.DataFrame([[100, 'Prometheus Barwis', 'prometheus.barwis@me.com',
    '(533) 072-2779'],[101, 'Alain Hennesey', 'alain.hennesey@facebook.com',
    '(942) 208-8460'],[102, 'Chao Peachy', 'chao.peachy@me.com',
    '(510) 121-0098'],[103, 'Somtochukwu Mouritsen',
    'somtochukwu.mouritsen@me.com','(669) 504-8080'],[104,
    'Elisabeth Berry', 'elisabeth.berry@facebook.com','(802) 973-8267']],
    columns = ['customer_id', 'name', 'email', 'phone'])
orders = pd.DataFrame([[1000, 100, 144.82], [1001, 100, 140.93],
   [1002, 102, 104.26], [1003, 100, 194.6 ], [1004, 100, 307.72],
   [1005, 101,  36.69], [1006, 104,  39.59], [1007, 104, 430.94],
   [1008, 103,  31.4 ], [1009, 104, 180.69], [1010, 102, 383.35],
   [1011, 101, 256.2 ], [1012, 103, 930.56], [1013, 100, 423.77],
   [1014, 101, 309.53], [1015, 102, 299.19]],
   columns = ['order_id', 'customer_id', 'order_total'])

combined = pd.merge(customers,orders, on='customer_id')
grouped = combined.groupby('customer_id')['order_total']
grouped.aggregate(np.sum).reset_index()

预期结果:名为customer_spend"的数据框,包含 customer_id、name 和 total_spend 列.total_spend 是一个包含 order_total 总和的新列.

desired result: dataframe named 'customer_spend' with columns of customer_id, name, and total_spend. total_spend is a new column containing the sum of order_total.

到目前为止我得到了什么:只有 customer_id 和 order_total.

what I've got so far: only customer_id and order_total.

我还是这个社区的新手.如果我做了不适当的事情,请告诉我.谢谢.

I'm still new to this community. If I'm doing something inappropriate, please let me know. Thank you.

推荐答案

考虑先按 customer_id 聚合 orders,然后合并生成的 customer_id- 将 DataFrame 索引到 customers 的所需列:

Consider first aggregating orders by customer_id, then merging the resulting customer_id-indexed DataFrame onto the desired columns of customers:

cust2spend = orders.groupby('customer_id').sum()[['order_total']].reset_index()
cust2spend
customer_id     order_total
        100         1211.84
        101          602.42
        102          786.80
        103          961.96
        104          651.22

# Before merging, rename the order_total column to total_spend.
# Note that axis=1 could also be axis='columns'.
cust2spend.rename({'order_total': 'total_spend'}, axis=1, inplace=True)

pd.merge(customers[['customer_id', 'name']], cust2spend, on='customer_id')
   customer_id                   name  total_spend
0          100      Prometheus Barwis      1211.84
1          101         Alain Hennesey       602.42
2          102            Chao Peachy       786.80
3          103  Somtochukwu Mouritsen       961.96
4          104        Elisabeth Berry       651.22

这篇关于除了计算之外,还使用现有数据框选择性地构建新数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆