Groupby 和附加列表和字符串 [英] Groupby and append lists and strings

查看:58
本文介绍了Groupby 和附加列表和字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对value_1"列中的值进行分组.但我的最后一栏是由列表组成的.当我尝试使用value_1"列进行分组时,由列表组成的列消失了.

I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.

数据框:

 value_1:        value_2:           value_3:               list: 
 american     california, nyc      walmart, kmart      [supermarket, connivence] 
 canadian         toronto            dunkinDonuts      [coffee]
 american          texas                               [state]
 canadian                             walmart          [supermarket] 
   ...              ...                 ...              ....

我的预期输出是:

value_1:        value_2:              value_3:             list: 
american   california, nyc, texas   walmart, kmart      [supermarket, connivence, state] 
canadian         toronto         dunkinDonuts, walmart  [coffee, supermarket]

谢谢!

推荐答案

通过没有listvalue_1的所有列和list动态创建字典代码> 将 lambda 函数与列表理解结合使用,并进行展平:

Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:

f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2 

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
     value_1                 value_2                value_3  \
0   american  california, nyc, texas         walmart, kmart   
1   canadian                 toronto  dunkinDonuts, walmart   

                               list  
0  [supermarket, connivence, state]  
1             [coffee, supermarket]  

说明:

f1f2 是 lambda 函数.

f1 and f2 are lambda functions.

首先删除缺失值(如果存在)并用分隔符join 字符串:

First remove missing values (if exist) and join strings with separator:

f1 = lambda x: ', '.join(x.dropna())

首先只获取字符串值(省略缺失值,因为NaNs)和join字符串与分隔符:

First get only strings values (omit missing values, because NaNs) and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])

首先通过过滤空字符串获取所有字符串值并使用分隔符join字符串:

First get all string values with filtering empty strings and join strings with separator:

f1 = lambda x: ', '.join([y for y in x if y != '']) 

函数 f2 用于 展平列表,因为聚合后得到嵌套列表,如 [['a','b'], ['c']]

Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]

f2 = lambda x: [z for y in x for z in y]

这篇关于Groupby 和附加列表和字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆