pandas 数据框上的多个条件 [英] Multiple conditions on pandas dataframe

查看:84
本文介绍了pandas 数据框上的多个条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要在数据集上运行的条件列表,以对大量数据进行排序.

I have a list of conditions to be run on the dataset to sort huge data.

df = 一个巨大的数据帧.例如.

  Index D1  D2  D3   D5      D6
    0   8   5   0  False   True
    1  45  35   0   True  False
    2  35  10   1  False   True
    3  40   5   2   True  False
    4  12  10   5  False  False
    5  18  15  13  False   True
    6  25  15   5   True  False
    7  35  10  11  False   True
    8  95  50   0  False  False

我必须根据给定的订单在 df 以上排序:

I have to sort above df based on given orders:

orders = [[A, B],[D, ~E, B], [~C, ~A], [~C, A]...] 
#(where A, B, C , D, E are the conditions) 

例如

A = df['D1'].le(50)
B = df['D2'].ge(5)
C = df['D3'].ne(0)
D = df['D1'].ne(False)
E = df['D1'].ne(True)
# In the real scenario, I have 64 such conditions to be run on 5 million records. 

例如.我必须运行所有这些条件才能获得结果输出.

eg. I have to run all these conditions to get the resultant output.

实现以下任务的最简单方法是什么,使用 for loopmap.apply 对它们进行排序?

What is the easiest way to achieve the following task, to order them using for loop or map or .apply?

  df = df.loc[A & B]
  df = df.loc[D & ~E & B]
  df = df.loc[~C & ~A]
  df = df.loc[~C & A]

结果 df 将是我预期的输出.

Resultant df would be my expected output.

在这里,我更想知道,如果我想运行存储在列表中的 multiple conditions,您将如何使用循环或映射或 .apply.不是结果输出.

Here I am more interested in knowing, how would you use loop or map or .apply, If I want to run multiple conditions which are stored in a list. Not the resultant output.

例如:

for i in orders:
   df = df[all(i)] # I am not able to implement this logic for each order

推荐答案

您正在寻找 bitwise and orders 中的所有元素.在这种情况下:

You are looking for bitwise and all the elements inside orders. In which case:

df = df[np.concatenate(orders).all(0)]

这篇关于pandas 数据框上的多个条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆