大 pandas ,在groupby之后创建列 [英] Pandas, create columns after groupby

查看:172
本文介绍了大 pandas ,在groupby之后创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于Pandas DataFrame'test_df':

  id_customer id_order product_name 
3 78 product1
3 79 product2
3 80 product3
7 100 product4
9 109 product5



<

  id_customer order_1 order_2 product_name_1 product_name_2 
3在'id_customer'上的groupby之后如何获得:

78 79 product1 product2
7 100 product4
9 109 product5

目标是在groupby之后检索最小值在2和匹配每个'id_customer'的行数之间,然后在可能的情况下填充上述所有字段。

我开始

  def order_to_col(my_dataframe_d (0,min(len(my_list),2)):
my_dataframe_df ['order _'+ str(num)] = my_list [num]

test_df = test_df.groupby('id_customer')。apply(lambda x:order_to_col(test_df,list(x.id_order)))

但是我不确定这不是好的方法。

解决方案

注意:

 在[11]中:g = df.groupby('id_customer ')

在[12]中:g.head(2)
Out [12]:
id_customer id_order product_name
0 3 78 product1
1 3 79 product2
3 7 100 product4
4 9 109 product5






你可以结合使用nth的0和1,然后连接这些:

  In [21 ]:g = df.groupby('id_customer')

在[22]中:g [['id_order', product_name']]。第n(0)
出[22]:
id_order产品名称
id_customer
3 78产品1
7 100产品4
9 109产品5

在[23]中:g [['id_order','product_name']]。第n(1)
Out [23]:
id_order product_name
id_customer
3 79 product2

在[24]中:a = g [['id_order','product_name']] .n(0)
b = g [['id_order' ,'product_name']]。nth(1)

In [25]:pd.concat([a,b],axis = 1)
Out [25]:
id_order product_name id_order product_name
id_customer
3 78 product1 79 product2
7 100 product4 NaN NaN
9 109 product5 NaN NaN


Regarding the Pandas DataFrame 'test_df':

 id_customer   id_order   product_name
    3             78        product1
    3             79        product2
    3             80        product3
    7             100       product4
    9             109       product5

After a groupby on 'id_customer' how is it possible to get:

 id_customer order_1     order_2   product_name_1   product_name_2
    3          78           79           product1         product2
    7          100                       product4      
    9          109                       product5

The goal is to retrieve the minimum between 2 and the number of line matching each 'id_customer' after the groupby, and then, if possible, fill all the above fields.

I started with

def order_to_col(my_dataframe_df,my_list):
  for num in range(0,min(len(my_list),2)):
    my_dataframe_df['order_'+str(num)] = my_list[num]

test_df = test_df.groupby('id_customer').apply(lambda x: order_to_col(test_df,list(x.id_order)))

but I'm quit sure it's not the good approach

解决方案

Note: I recommend using head to do this rather than using multiple columns:

In [11]: g = df.groupby('id_customer')

In [12]: g.head(2)
Out[12]:
   id_customer  id_order product_name
0            3        78     product1
1            3        79     product2
3            7       100     product4
4            9       109     product5


You can combine the 0th and 1st using nth and then concat these:

In [21]: g = df.groupby('id_customer')

In [22]: g[['id_order', 'product_name']].nth(0)
Out[22]:
             id_order product_name
id_customer
3                  78     product1
7                 100     product4
9                 109     product5

In [23]: g[['id_order', 'product_name']].nth(1)
Out[23]:
             id_order product_name
id_customer
3                  79     product2

In [24]: a = g[['id_order', 'product_name']].nth(0)
         b = g[['id_order', 'product_name']].nth(1)

In [25]: pd.concat([a, b], axis=1)
Out[25]:
             id_order product_name  id_order product_name
id_customer
3                  78     product1        79     product2
7                 100     product4       NaN          NaN
9                 109     product5       NaN          NaN

这篇关于大 pandas ,在groupby之后创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆