大 pandas 不在里面,之间和之间 [英] pandas not in, in and between
问题描述
pd.版本 '0.14.0'
pd.version '0.14.0'
我需要对数据帧中的列做一个not in语句.
I need to do a not in statement for a column in a dataframe.
对于isin语句,我使用以下内容过滤所需的代码:
for the isin statement I use the following to filter for codes that I need:
h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]
我想为另一列做一个不等于或不等于(不确定哪个用于python)语句.
I want to do a not in or not equal to (not sure which one is used for python) statement for another column.
所以我尝试了以下操作:
So I tried the following:
h1 = df1[df1['csc_auth_12'].notin(['N6M','YEM','YEL','YEM'])]
h1 = df1[df1['csc_auth_12'] not in (['N6M','YEM','YEL','YEM'])]
和:
h1.query(['N6M','YEM','YEL','YEM'] not in ['csc_auth_12'])
我真的很想从数据集中过滤掉N6M,YEM,YEL和YEM.
I really want to filter out the N6M, YEM, YEL and YEM from the data set.
我也对如何做一个之间的陈述感兴趣.
I'm also interested in how to do an between statement.
因此,对于以下内容,我必须手动键入所有500个代码.我想做类似的事情:
So for the following I had to manually type in all the 500 codes. I would like to do something like:
h1 = df1[df1['nat_actn_2_3'].isin['100','102'] and isbetween [500 & 599])]
但这就是我所拥有的:
h1 = df1[df1['nat_actn_2_3'].isin(['100','101','102','103','104','107','108','112','115','117','120','122','124','128',
'130','132','132','140','141','142','143','145','146','147','148','149','170','171',
'172','173','179','190','198','199','501','502','503','504','505','506','507','508',
'509','510','511','512','513','514','515','516','517','518','519','520','521','522',
'523','524','525','526','527','528','529','530','531','532','533','534','535','536',
'537','538','539','540','541','542','543','544','545','546','547','548','549','550',
'551','552','553','554','555','556','557','558','559','560','561','562','563','564',
'565','566','567','568','569','570','571','572','573','574','575','576','577','578',
'579','580','581','582','583','584','585','586','587','588','589','590','591','592',
'593','594','595','596','597','598','599','702','721','740','953','955'])]
有什么建议吗?
谢谢.
推荐答案
使用~
反转掩码来抵消布尔条件:
negate the boolean condition using ~
to invert the mask:
h1 = df1[~df1['nat_actn_2_3'].isin(['100','101','102','103','104'])]
notin
和not in
,前者不存在,而后者可能会引发ValueError
或模棱两可的值错误,因为您正尝试将in
与数组一起使用,而pandas不能像那个.
notin
and not in
, the former doesn't exist and the latter will likely raise a ValueError
or ambiguous value error as you're trying to use in
with an array and pandas does not work like that.
对于第二个问题,您需要像这样组合布尔条件:
For the second question you need to compound your boolean conditions like so:
h1 = df1[(df1['nat_actn_2_3'].isin['100','102']) | ((df1['nat_acctn_2_3'] > 500) & (df1['nat_actn_2_3'] < 599))]
因此,我假设您希望从文本中获得的行等于100/102或介于500和599之间(不清楚是否要包含这些值,但可以分别更改为>=
和<=
).
So I'm assuming from your text you want rows that are either equal to 100/102 or between 500 and 599 (unclear if you're including those values but you can just change to >=
and <=
respectively).
在这里,分别对and
和or
使用按位运算符&
和|
,由于运算符的优先级,还需要将()
包裹在每个条件周围
Here you use the bitwise operators &
and |
for and
and or
respectively, also you need to wrap ()
around each condition due to operator precedence
这篇关于大 pandas 不在里面,之间和之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!