pandas 数据框上的多个条件 [英] Multiple conditions on pandas dataframe
问题描述
我有一个要在数据集上运行的条件列表,以对大量数据进行排序.
I have a list of conditions to be run on the dataset to sort huge data.
df = 一个巨大的数据帧.
例如.
Index D1 D2 D3 D5 D6
0 8 5 0 False True
1 45 35 0 True False
2 35 10 1 False True
3 40 5 2 True False
4 12 10 5 False False
5 18 15 13 False True
6 25 15 5 True False
7 35 10 11 False True
8 95 50 0 False False
我必须根据给定的订单在 df 以上排序:
I have to sort above df based on given orders:
orders = [[A, B],[D, ~E, B], [~C, ~A], [~C, A]...]
#(where A, B, C , D, E are the conditions)
例如
A = df['D1'].le(50)
B = df['D2'].ge(5)
C = df['D3'].ne(0)
D = df['D1'].ne(False)
E = df['D1'].ne(True)
# In the real scenario, I have 64 such conditions to be run on 5 million records.
例如.我必须运行所有这些条件才能获得结果输出.
eg. I have to run all these conditions to get the resultant output.
实现以下任务的最简单方法是什么,使用 for loop
或 map
或 .apply
对它们进行排序?
What is the easiest way to achieve the following task, to order them using for loop
or map
or .apply
?
df = df.loc[A & B]
df = df.loc[D & ~E & B]
df = df.loc[~C & ~A]
df = df.loc[~C & A]
结果 df 将是我预期的输出.
Resultant df would be my expected output.
在这里,我更想知道,如果我想运行存储在列表中的 multiple conditions
,您将如何使用循环或映射或 .apply.不是结果输出.
Here I am more interested in knowing, how would you use loop or map or .apply, If I want to run multiple conditions
which are stored in a list. Not the resultant output.
例如:
for i in orders:
df = df[all(i)] # I am not able to implement this logic for each order
推荐答案
您正在寻找 bitwise and
orders
中的所有元素.在这种情况下:
You are looking for bitwise and
all the elements inside orders
. In which case:
df = df[np.concatenate(orders).all(0)]
这篇关于pandas 数据框上的多个条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!