验证数据框列数据 [英] Validating dataframe column data
本文介绍了验证数据框列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个下面的伪代码,需要使用熊猫编写。
I have a below pseudocode which I need to write using pandas.
if group_min_size && group_max_size
if group_min_size == 0 && group_max_size > 0
if group_max_size >= 2
errors.add(:group_min_size, "must be greater than or equal to 2 and less than or equal to group_max_size (#{group_max_size})")
end
if group_max_size < 2
errors.add(:group_min_size, "must be greater than 2")
errors.add(:group_max_size, "must be greater than 2")
end
end
if group_min_size > 0 && group_max_size == 0
if group_min_size >= 2
errors.add(:group_max_size, "must be greater than or equal to #{group_min_size}")
end
if group_min_size < 2
errors.add(:group_min_size, "must be greater than 2")
errors.add(:group_max_size, "must be greater than 2")
end
end
end
我试图分解成较小的部分,并写下面的内容-
I tried to break in smaller parts and write something like below-
m8 = ((~df['group_min_size'].notna() & ~df['group_min_size'].notna()) | ((~df['group_min_size'] == 0) & (~df['group_max_size'] > 2)) | (df['group_max_size'] >= 2))
这是
if group_min_size == 0 && group_max_size > 0
if group_max_size >= 2
errors.add(:group_min_size, "must be greater than or equal to 2 and less than or equal to group_max_size (#{group_max_size})")
end
但效果不如预期。
下面是我的测试数据-
group_min_size group_max_size
0 0.0 1.0
1 10.0 20.0
2 0.0 3.0
3 3.0 0.0
4 NaN NaN
5 2.0 2.0
6 2.0 2.0
7 2.0 2.0
8 2.0 2.0
基于伪代码逻辑,输出应为:
Based on the psudo code logic, the output should be:
False
True
False
False
True
True
True
True
True
我该如何在熊猫中编写此逻辑?
How do I write this logic in pandas?
推荐答案
只需逐步回答您的问题。首先创建布尔值:
Just answer your questions step by step. Begin by creating your booleans:
min_equal_0 = df['group_min_size'] == 0
min_above_0 = df['group_min_size'] > 0
min_above_equal_2 = df['group_min_size'] >= 2
min_below_2 = df['group_min_size'] < 2
max_equal_0 = df['group_max_size'] == 0
max_above_0 = df['group_max_size'] > 0
max_above_equal_2 = df['group_max_size'] >= 2
max_below_2 = df['group_max_size'] < 2
现在我们可以根据伪代码来创建蒙版了:
Now we can look at creating our masks according to the pseudo-code:
first_mask = ~(min_equal_0 & max_above_0 & (max_below_2 | max_above_equal_2))
second_mask = ~(max_equal_0 & min_above_0 & (min_below_2 | min_above_equal_2))
如果我们将两者结合起来:
If we combine the two:
>> first_mask & second_mask
0 False
1 True
2 False
3 False
4 True
5 True
6 True
7 True
8 True
dtype: bool
如果要治疗 NaN
如 False
,只需添加它们即可:
If you want to treat NaN
as False
, just add them:
min_is_not_null = df['group_min_size'].notnull()
max_is_not_null = df['group_max_size'].notnull()
>> min_is_not_null & max_is_not_null & first_mask & second_mask
0 False
1 True
2 False
3 False
4 False
5 True
6 True
7 True
8 True
dtype: bool
这篇关于验证数据框列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文