pandas 在字符串列上的滚动总和 [英] Pandas rolling sum on string column
问题描述
我正在使用Python3和pandas版本"0.19.2".
I'm using Python3 with pandas version '0.19.2'.
我有一个熊猫df,如下所示:
I have a pandas df as follows:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
我想按'chat_id'分组,然后在'line'上进行滚动总和以获取以下信息:
I want to group by 'chat_id', then do something like a rolling sum on 'line' to get the following:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
我相信df.groupby('chat_id')['line'].cumsum()仅适用于数字列.
I believe df.groupby('chat_id')['line'].cumsum() would only work on a numeric column.
我也尝试过df.groupby(by = ['chat_id'],as_index = False)['line'].apply(list)以获得完整对话中所有行的列表,但是我可以无法弄清楚如何解开该列表以创建滚动总和"样式的会话列.
I have also tried df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) to get a list of all the lines in the full conversation, but then I can't figure out how to unpack that list to create the 'rolling sum' style conversation column.
推荐答案
对我来说, Series.cumsum
,如果需要分隔符,请添加space
:
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
chat_id line new
0 1 Hi. Hi.
1 1 Hi, how are you?. Hi. Hi, how are you?.
2 1 I'm well, thanks. Hi. Hi, how are you?. I'm well, thanks.
3 2 Is it going to rain?. Is it going to rain?.
4 2 No, I don't think so. Is it going to rain?. No, I don't think so.
df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
chat_id line \
0 1 Hi.
1 1 Hi, how are you?.
2 1 I'm well, thanks.
3 2 Is it going to rain?.
4 2 No, I don't think so.
new
0 'Hi.'
1 'Hi. Hi, how are you?.'
2 'Hi. Hi, how are you?. I'm well, thanks.'
3 'Is it going to rain?.'
4 'Is it going to rain?. No, I don't think so.'
这篇关于 pandas 在字符串列上的滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!