条件数据帧选择的矢量化解决方案 [英] Vectorized solution to conditional dataframe selection

查看:74
本文介绍了条件数据帧选择的矢量化解决方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近问了一个已回答的问题-当列为一系列列表时,如何有条件地将其添加到pandas数据框列中的单元格选择中?,但是我相信我遇到了一个新问题



在以下数据框中,我需要两个条件才能导致对列 d 进行更改。 d 列中的每个值都是一个列表




  • 其中 a == b 时,d中的最后一个整数加1。

  • a!= b 的情况下,扩展整数列表,并将值 1 插入到 list 在列 d 中。

      abcd 
    开开[0] [0,3]
    开关[0] [0,1]
    开开[0] [2]
    开开[0] [0,4,4]
    开关[0] [0]


  • 结果,数据框将如下所示:

      abcd 
    开开[0] [ 0,4]
    开开[0] [0,1,1]
    开开[0] [3]
    开开[0] [0,4,5]
    开关[0] [0,1]




我意识到可以使用 pd.Series.apply 方法和预定义函数或使用 lambda来完成此操作但是,数据帧包含100000行,我希望可以存在针对这两个条件的矢量化解决方案。

解决方案

作为Edchum

一个非矢量化解决方案,其 应用 自定义功能

  df ['e'] = df ['d'] 

def扩展(lst):
返回lst + [1]

def增量(lst):
lst [-1] = lst [ -1] + 1
return lst

df.loc [df.a!= df.b,'d'] = df.e.apply(exten)
df .loc [df.a == df.b,'d'] = df.e.apply(incre)
df = df.drop('e',axis = 1)
print df
abcd
0开开[0] [0,4]
1开关[0] [0,1,1]
2开开[0] [3]
3开开[0] [0,4,5]
4开关[0] [0,1]


I recently asked a question which was answered - How do I add conditionally to a selection of cells in a pandas dataframe column when the the column is a series of lists?, but I believe have a new problem which I had not previously considered.

In the following dataframe I need two conditions to result in a change to column d. Each value in column d is a list.

  • Where a == b, the final integer in d is incremented by one.
  • Where a != b, the list of integers is extended and the value 1 is inserted at the end of the list in column d.

    a       b       c           d           
    On      On      [0]         [0,3]       
    On      Off     [0]         [0,1]
    On      On      [0]         [2]         
    On      On      [0]         [0,4,4]         
    On      Off     [0]         [0]
    

  • As a result, the dataframe would like this:

    a       b       c       d       
    On      On      [0]     [0,4]       
    On      Off     [0]     [0,1,1]     
    On      On      [0]     [3]
    On      On      [0]     [0,4,5] 
    On      Off     [0]     [0,1]
    

I realise that this can be done using pd.Series.apply method in conjunction with a predefined function or use of lambda however the data frame consists of 100000 rows and I was hoping that a vectorized solution to these two conditions may exist.

解决方案

As Edchum says, vecorised solution can be problematic.

One non vectorized solution with apply custom functions:

df['e'] = df['d']

def exten(lst):
    return lst + [1]

def incre(lst):
    lst[-1] = lst[-1] + 1
    return lst

df.loc[df.a != df.b, 'd'] = df.e.apply(exten)
df.loc[df.a == df.b, 'd'] = df.e.apply(incre)
df = df.drop('e', axis=1)
print df
    a    b    c          d
0  On   On  [0]     [0, 4]
1  On  Off  [0]  [0, 1, 1]
2  On   On  [0]        [3]
3  On   On  [0]  [0, 4, 5]
4  On  Off  [0]     [0, 1]

这篇关于条件数据帧选择的矢量化解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆