Groupby 和附加列表和字符串 [英] Groupby and append lists and strings
问题描述
我正在尝试对value_1"列中的值进行分组.但我的最后一栏是由列表组成的.当我尝试使用value_1"列进行分组时,由列表组成的列消失了.
I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.
数据框:
value_1: value_2: value_3: list:
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....
我的预期输出是:
value_1: value_2: value_3: list:
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]
谢谢!
推荐答案
通过没有list
和value_1
的所有列和list
动态创建字典代码> 将 lambda 函数与列表理解结合使用,并进行展平:
Create dynamically dictionary by all columns with no list
and value_1
and for list
use lambda function with list comprehension with flatenning:
f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2
df = df.groupby('value_1', as_index=False).agg(d)
print (df)
value_1 value_2 value_3 \
0 american california, nyc, texas walmart, kmart
1 canadian toronto dunkinDonuts, walmart
list
0 [supermarket, connivence, state]
1 [coffee, supermarket]
说明:
f1
和 f2
是 lambda 函数.
f1
and f2
are lambda functions.
首先删除缺失值(如果存在)并用分隔符join
字符串:
First remove missing values (if exist) and join
strings with separator:
f1 = lambda x: ', '.join(x.dropna())
首先只获取字符串值(省略缺失值,因为NaN
s)和join
字符串与分隔符:
First get only strings values (omit missing values, because NaN
s) and join
strings with separator:
f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
首先通过过滤空字符串获取所有字符串值并使用分隔符join
字符串:
First get all string values with filtering empty strings and join
strings with separator:
f1 = lambda x: ', '.join([y for y in x if y != ''])
函数 f2
用于 展平列表,因为聚合后得到嵌套列表,如 [['a','b'], ['c']]
Function f2
is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]
f2 = lambda x: [z for y in x for z in y]
这篇关于Groupby 和附加列表和字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!