Pandas Dataframe,列表列,创建累积列表集列,并按记录差异进行记录 [英] Pandas Dataframe, Column of lists, Create column of sets of cumulative lists, and record by record differences

查看:120
本文介绍了Pandas Dataframe,列表列,创建累积列表集列,并按记录差异进行记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有列表lists列的简单数据框df.我想基于lists生成3个其他列.

I have a simple dataframe df with a column of lists lists. I would like to generate 3 additional columns based on lists.

df看起来像:

import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df

          lists
1           [1]
2     [1, 2, 3]
3  [2, 9, 7, 9]
4  [2, 7, 3, 5]

我希望df看起来像这样:

    lists     cumset        adds    drops
1   [1]       {1}           {1}     {}
2   [1,2,3]   {1,2,3}       {2,3}   {}
3   [2,9,7,9] {1,2,3,7,9}   {7,9}   {3}
4   [2,7,3,5] {1,2,3,5,7,9} {3,5}   {9}

基本上,我需要弄清楚如何创建cumset(某种类型的apply?,(已经有熊猫函数吗?).那么对于添加和删除,基本上我们想将df.lists与df.lists.shift(),然后确定哪些是新的,哪些是丢失的.也许像这样:

Basically I need to figure out how to create cumset (some type of apply?, (is there already a pandas function?). Then for the adds and drops, basically we want to compare the df.lists to the df.lists.shift(), and determine which items are new and which items are missing. maybe something like:

df['adds']=df[['lists',df.lists.shift()]].apply(lambda x: {i for i in x.lists if i not in x.lists.shift()}, axis=1)  

玩得开心,谢谢.

推荐答案

您可以使用

You can use pandas.DataFrame.cumsum to make the cumulative column and make a column with sets instead of lists and use pandas.DataFrame.shift to make "add" and "drop" columns:

import pandas as pd
import numpy as np


df['cumset'] = df['lists'].cumsum().apply(lambda x: np.unique(x))
df['sets'] = df['lists'].apply(lambda x: set(x))

shifted = df['sets'].shift(1).apply(lambda x: x if not pd.isnull(x) else set())

df['add'] = df['sets'] - shifted
df['drop'] = shifted - df['sets']
df = df.drop('sets', axis=1)

print(df)
#-->Output:
          lists              cumset     add    drop
1           [1]                 [1]     {1}      {}
2     [1, 2, 3]           [1, 2, 3]  {2, 3}      {}
3  [2, 9, 7, 9]     [1, 2, 3, 7, 9]  {9, 7}  {1, 3}
4  [2, 7, 3, 5]  [1, 2, 3, 5, 7, 9]  {3, 5}     {9}

这篇关于Pandas Dataframe,列表列,创建累积列表集列,并按记录差异进行记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆