使用Pandas groupby连接来自多行的字符串 [英] Concatenate strings from several rows using Pandas groupby
问题描述
我想基于Pandas中的groupedby合并数据框中的几个字符串.
I want to merge several strings in a dataframe based on a groupedby in Pandas.
到目前为止,这是我的代码:
This is my code so far:
import pandas as pd
from io import StringIO
data = StringIO("""
"name1","hej","2014-11-01"
"name1","du","2014-11-02"
"name1","aj","2014-12-01"
"name1","oj","2014-12-02"
"name2","fin","2014-11-01"
"name2","katt","2014-11-02"
"name2","mycket","2014-12-01"
"name2","lite","2014-12-01"
""")
# load string as stream into dataframe
df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2])
# add column with month
df["month"] = df["date"].apply(lambda x: x.month)
我希望最终结果看起来像这样:
I want the end result to look like this:
我不知道如何使用groupby并在文本"列中应用某种形式的字符串连接.任何帮助表示赞赏!
I don't get how I can use groupby and apply some sort of concatenation of the strings in the column "text". Any help appreciated!
推荐答案
您可以对'name'
和'month'
列进行分组,然后调用transform
,这将返回与原始df对齐的数据并在其中输入一个lambda join
文本条目:
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:
df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12
我通过在此处传递感兴趣的列的列表df[['name','text','month']]
来替换原始df,然后调用drop_duplicates
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
编辑实际上,我可以先呼叫apply
然后呼叫reset_index
:
EDIT actually I can just call apply
and then reset_index
:
In [124]:
df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()
Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
更新
lambda
在这里是不必要的:
In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
这篇关于使用Pandas groupby连接来自多行的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!