pandas 在字符串列上的滚动总和 [英] Pandas rolling sum on string column

查看:98
本文介绍了 pandas 在字符串列上的滚动总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python3和pandas版本"0.19.2".

I'm using Python3 with pandas version '0.19.2'.

我有一个熊猫df,如下所示:

I have a pandas df as follows:

chat_id    line
1          'Hi.'
1          'Hi, how are you?.'
1          'I'm well, thanks.'
2          'Is it going to rain?.'
2          'No, I don't think so.'

我想按'chat_id'分组,然后在'line'上进行滚动总和以获取以下信息:

I want to group by 'chat_id', then do something like a rolling sum on 'line' to get the following:

chat_id    line                     conversation
1          'Hi.'                    'Hi.'
1          'Hi, how are you?.'      'Hi. Hi, how are you?.'
1          'I'm well, thanks.'      'Hi. Hi, how are you?. I'm well, thanks.'
2          'Is it going to rain?.'  'Is it going to rain?.'
2          'No, I don't think so.'  'Is it going to rain?. No, I don't think so.'

我相信df.groupby('chat_id')['line'].cumsum()仅适用于数字列.

I believe df.groupby('chat_id')['line'].cumsum() would only work on a numeric column.

我也尝试过df.groupby(by = ['chat_id'],as_index = False)['line'].apply(list)以获得完整对话中所有行的列表,但是我可以无法弄清楚如何解开该列表以创建滚动总和"样式的会话列.

I have also tried df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) to get a list of all the lines in the full conversation, but then I can't figure out how to unpack that list to create the 'rolling sum' style conversation column.

推荐答案

对我来说,


df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
   chat_id                   line  \
0        1                    Hi.   
1        1      Hi, how are you?.   
2        1      I'm well, thanks.   
3        2  Is it going to rain?.   
4        2  No, I don't think so.   

                                             new  
0                                          'Hi.'  
1                        'Hi. Hi, how are you?.'  
2      'Hi. Hi, how are you?. I'm well, thanks.'  
3                        'Is it going to rain?.'  
4  'Is it going to rain?. No, I don't think so.' 

这篇关于 pandas 在字符串列上的滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆