使用 Pandas groupby 从多行连接字符串 [英] Concatenate strings from several rows using Pandas groupby
问题描述
我想根据 Pandas 中的 groupedby 合并数据帧中的多个字符串.
I want to merge several strings in a dataframe based on a groupedby in Pandas.
这是我目前的代码:
import pandas as pd
from io import StringIO
data = StringIO("""
"name1","hej","2014-11-01"
"name1","du","2014-11-02"
"name1","aj","2014-12-01"
"name1","oj","2014-12-02"
"name2","fin","2014-11-01"
"name2","katt","2014-11-02"
"name2","mycket","2014-12-01"
"name2","lite","2014-12-01"
""")
# load string as stream into dataframe
df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2])
# add column with month
df["month"] = df["date"].apply(lambda x: x.month)
我希望最终结果如下所示:
I want the end result to look like this:
我不知道如何使用 groupby 并在文本"列中应用某种字符串连接.任何帮助表示赞赏!
I don't get how I can use groupby and apply some sort of concatenation of the strings in the column "text". Any help appreciated!
推荐答案
您可以按 'name'
和 'month'
列分组,然后调用 transform
将返回与原始 df 对齐的数据,并在我们join
文本条目的地方应用 lambda:
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:
df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12
我通过在此处传递感兴趣的列的列表 df[['name','text','month']]
来子原始 df,然后调用 drop_duplicates代码>
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
EDIT 实际上我可以调用 apply
然后 reset_index
:
EDIT actually I can just call apply
and then reset_index
:
In [124]:
df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()
Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
更新
lambda
在这里是不必要的:
In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
这篇关于使用 Pandas groupby 从多行连接字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!