python: pandas 数据框中的累积连接 [英] python: cumulative concatenate in pandas dataframe

查看:35
本文介绍了python: pandas 数据框中的累积连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 Pandas 数据框中进行累积连接?我发现在 R 中有很多解决方案,但在 python 中找不到.

How to do a cumulative concatenate in pandas dataframe? I found there are a number of solutions in R, but can't find it in python.

问题在于:假设我们有一个数据框:列:datename:

Here is the problem: suppose we have a dataframe: with columns: date and name:

import pandas as pd

d = {'date': [1,1,2,2,3,3,3,4,4,4], 'name':['A','B','A','C','A','B','B','A','B','C']}
df = pd.DataFrame(data=d)

我想获取CUM_CONCAT,这是一个累积连接分组日期:

I want to get CUM_CONCAT, which is a cumulative concatenate groupby date:

    date name  CUM_CONCAT
0     1    A      [A]
1     1    B      [A,B]
2     2    A      [A]
3     2    C      [A,C]
4     3    A      [A]
5     3    B      [A,B]
6     3    B      [A,B,B]
7     4    A      [A]
8     4    B      [A,B]
9     4    C      [A,B,C]

到目前为止我已经尝试过:

so far i've tried:

temp = df.groupby(['date'])['name'].apply(list)
df = df.join(temp, 'date', rsuffix='_cum_concat')

我得到的是:

    date name  CUM_CONCAT
0     1    A      [A,B]
1     1    B      [A,B]
2     2    A      [A,C]
3     2    C      [A,C]
4     3    A      [A,B,B]
5     3    B      [A,B,B]
6     3    B      [A,B,B]
7     4    A      [A,B,C]
8     4    B      [A,B,C]
9     4    C      [A,B,C]

我知道有 .rollingcumsum 函数,它们与我需要的类似,但它们主要用于累积和而不是用于 concat.

I know there are .rolling and cumsum functions, which are similar to what i need, but they are mainly for cumulative sum not for concat.

任何帮助将不胜感激!!!

Any help will be appreciated!!!

推荐答案

我想出了如下解决方案:

I have came up with a solution as follow:

就运行时间而言,两种解决方案(我和@Wen-Ben)看起来相似,他的代码更短

In terms of time taken to run, both solutions (me and @Wen-Ben) seem similar, his code is shorter tho

from itertools import accumulate

def cum_concat(x):
    return list(accumulate(x))
f = lambda x: cum_concat([[i] for i in x])
b =df.groupby(['date'])['name'].apply(f)
df['CUM_CONCAT']=[item for sublist in b for item in sublist]

df
Out: 
   date name CUM_CONCAT
0     1    A        [A]
1     1    B     [A, B]
2     2    A        [A]
3     2    C     [A, C]
4     3    A        [A]
5     3    B     [A, B]
6     3    B  [A, B, B]
7     4    A        [A]
8     4    B     [A, B]
9     4    C  [A, B, C]

这篇关于python: pandas 数据框中的累积连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆