大 pandas 不在里面,之间和之间 [英] pandas not in, in and between

查看:119
本文介绍了大 pandas 不在里面,之间和之间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pd.版本 '0.14.0'

pd.version '0.14.0'

我需要对数据帧中的列做一个not in语句.

I need to do a not in statement for a column in a dataframe.

对于isin语句,我使用以下内容过滤所需的代码:

for the isin statement I use the following to filter for codes that I need:

h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]

我想为另一列做一个不等于或不等于(不确定哪个用于python)语句.

I want to do a not in or not equal to (not sure which one is used for python) statement for another column.

所以我尝试了以下操作:

So I tried the following:

h1 = df1[df1['csc_auth_12'].notin(['N6M','YEM','YEL','YEM'])]

h1 = df1[df1['csc_auth_12'] not in (['N6M','YEM','YEL','YEM'])]

和:

h1.query(['N6M','YEM','YEL','YEM'] not in ['csc_auth_12'])

我真的很想从数据集中过滤掉N6M,YEM,YEL和YEM.

I really want to filter out the N6M, YEM, YEL and YEM from the data set.

我也对如何做一个之间的陈述感兴趣.

I'm also interested in how to do an between statement.

因此,对于以下内容,我必须手动键入所有500个代码.我想做类似的事情:

So for the following I had to manually type in all the 500 codes. I would like to do something like:

h1 = df1[df1['nat_actn_2_3'].isin['100','102'] and isbetween [500 & 599])]

但这就是我所拥有的:

h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104','107','108','112','115','117','120','122','124','128',
                             '130','132','132','140','141','142','143','145','146','147','148','149','170','171',
                             '172','173','179','190','198','199','501','502','503','504','505','506','507','508',
                             '509','510','511','512','513','514','515','516','517','518','519','520','521','522',
                             '523','524','525','526','527','528','529','530','531','532','533','534','535','536',
                             '537','538','539','540','541','542','543','544','545','546','547','548','549','550',
                             '551','552','553','554','555','556','557','558','559','560','561','562','563','564',
                             '565','566','567','568','569','570','571','572','573','574','575','576','577','578',
                             '579','580','581','582','583','584','585','586','587','588','589','590','591','592',
                             '593','594','595','596','597','598','599','702','721','740','953','955'])]

有什么建议吗?

谢谢.

推荐答案

使用~反转掩码来抵消布尔条件:

negate the boolean condition using ~ to invert the mask:

h1 = df1[~df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]

notinnot in,前者不存在,而后者可能会引发ValueError或模棱两可的值错误,因为您正尝试将in与数组一起使用,而pandas不能像那个.

notin and not in, the former doesn't exist and the latter will likely raise a ValueError or ambiguous value error as you're trying to use in with an array and pandas does not work like that.

对于第二个问题,您需要像这样组合布尔条件:

For the second question you need to compound your boolean conditions like so:

h1 = df1[(df1['nat_actn_2_3'].isin['100','102']) | ((df1['nat_acctn_2_3'] > 500) & (df1['nat_actn_2_3'] < 599))]

因此,我假设您希望从文本中获得的行等于100/102或介于500和599之间(不清楚是否要包含这些值,但可以分别更改为>=<= ).

So I'm assuming from your text you want rows that are either equal to 100/102 or between 500 and 599 (unclear if you're including those values but you can just change to >= and <= respectively).

在这里,分别对andor使用按位运算符&|,由于运算符的优先级,还需要将()包裹在每个条件周围

Here you use the bitwise operators & and | for and and or respectively, also you need to wrap () around each condition due to operator precedence

这篇关于大 pandas 不在里面,之间和之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆