条件数据帧选择的矢量化解决方案 [英] Vectorized solution to conditional dataframe selection
问题描述
我最近问了一个已回答的问题-当列为一系列列表时,如何有条件地将其添加到pandas数据框列中的单元格选择中?,但是我相信我遇到了一个新问题
在以下数据框中,我需要两个条件才能导致对列 d
进行更改。 d
列中的每个值都是一个列表
。
- 其中
a == b
时,d中的最后一个整数加1。 -
在
a!= b
的情况下,扩展整数列表,并将值1
插入到list
在列d
中。abcd
开开[0] [0,3]
开关[0] [0,1]
开开[0] [2]
开开[0] [0,4,4]
开关[0] [0]
-
结果,数据框将如下所示:
abcd
开开[0] [ 0,4]
开开[0] [0,1,1]
开开[0] [3]
开开[0] [0,4,5]
开关[0] [0,1]
我意识到可以使用 pd.Series.apply
方法和预定义函数或使用 lambda来完成此操作
但是,数据帧包含100000行,我希望可以存在针对这两个条件的矢量化解决方案。
一个非矢量化解决方案,其 应用
自定义功能
:
df ['e'] = df ['d']
def扩展(lst):
返回lst + [1]
def增量(lst):
lst [-1] = lst [ -1] + 1
return lst
df.loc [df.a!= df.b,'d'] = df.e.apply(exten)
df .loc [df.a == df.b,'d'] = df.e.apply(incre)
df = df.drop('e',axis = 1)
print df
abcd
0开开[0] [0,4]
1开关[0] [0,1,1]
2开开[0] [3]
3开开[0] [0,4,5]
4开关[0] [0,1]
I recently asked a question which was answered - How do I add conditionally to a selection of cells in a pandas dataframe column when the the column is a series of lists?, but I believe have a new problem which I had not previously considered.
In the following dataframe I need two conditions to result in a change to column d
. Each value in column d
is a list
.
- Where
a == b
, the final integer in d is incremented by one. Where
a != b
, the list of integers is extended and the value1
is inserted at the end of thelist
in columnd
.a b c d On On [0] [0,3] On Off [0] [0,1] On On [0] [2] On On [0] [0,4,4] On Off [0] [0]
As a result, the dataframe would like this:
a b c d On On [0] [0,4] On Off [0] [0,1,1] On On [0] [3] On On [0] [0,4,5] On Off [0] [0,1]
I realise that this can be done using pd.Series.apply
method in conjunction with a predefined function or use of lambda
however the data frame consists of 100000 rows and I was hoping that a vectorized solution to these two conditions may exist.
As Edchum says, vecorised solution can be problematic.
One non vectorized solution with apply
custom functions
:
df['e'] = df['d']
def exten(lst):
return lst + [1]
def incre(lst):
lst[-1] = lst[-1] + 1
return lst
df.loc[df.a != df.b, 'd'] = df.e.apply(exten)
df.loc[df.a == df.b, 'd'] = df.e.apply(incre)
df = df.drop('e', axis=1)
print df
a b c d
0 On On [0] [0, 4]
1 On Off [0] [0, 1, 1]
2 On On [0] [3]
3 On On [0] [0, 4, 5]
4 On Off [0] [0, 1]
这篇关于条件数据帧选择的矢量化解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!