Python Pandas 检查一个值是否在同一天出现多次 [英] Python Pandas check if a value occurs more then once in the same day

查看:24
本文介绍了Python Pandas 检查一个值是否在同一天出现多次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas 数据框,如下所示.我想要做的是检查一个站是否在同一天有变量 yyy 和任何其他变量(如 station1 的情况).如果这是真的,我需要删除包含 yyy 的整行.

目前我正在使用 iterrows() 并循环搜索此变量出现的日期,将变量更改为删除我"之类的内容,从中构建一个新的数据框(因为pandas 不支持就地替换) 并过滤新数据框以去除不需要的行.这现在有效,因为我的数据帧很小,但不太可能扩展.

问题:这似乎是一种非常非 Pandas"的方式来执行此操作,是否有其他方法可以删除不需要的变量?

 dateuse 站变量10 2012-08-12 00:00:00 站 1 xxx1 2012-08-12 00:00:00 station1 yyy2 2012-08-23 00:00:00 station2 aaa3 2012-08-23 00:00:00 station3 bbb4 2012-08-25 00:00:00 station4 ccc5 2012-08-25 00:00:00 station4 ccc6 2012-08-25 00:00:00 station4 ccc

解决方案

我可能会使用布尔数组进行索引.我们想删除具有 yyy 和多个 dateuse/station 组合的行(如果我明白你在追求什么的话!).

我们可以使用transform将每个dateuse/station组合的大小广播到数据帧的长度,然后选择组中长度 > 1 的行.然后我们可以 & 这与 yyy s 所在的位置.

<预><代码>>>>multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) >1>>>must_be_isolated = df["variable1"] == "yyy">>>df[~(多个 & must_be_isolated)]日期使用站变量10 2012-08-12 00:00:00 站 1 xxx2 2012-08-23 00:00:00 station2 aaa3 2012-08-23 00:00:00 station3 bbb4 2012-08-25 00:00:00 station4 ccc5 2012-08-25 00:00:00 station4 ccc6 2012-08-25 00:00:00 station4 ccc

I have a Pandas dataframe as below. What I am trying to do is check if a station has variable yyy and any other variable on the same day (as in the case of station1). If this is true I need to delete the whole row containing yyy.

Currently I am doing this using iterrows() and looping to search the days in which this variable appears, changing the variable to something like "delete me", building a new dataframe from this (because pandas doesn't support replacing in place) and filtering the new dataframe to get rid of the unwanted rows. This works now because my dataframes are small, but is not likely to scale.

Question: This seems like a very "non-Pandas" way to do this, is there some other method of deleting out the unwanted variables?

                dateuse         station         variable1
0   2012-08-12 00:00:00        station1               xxx
1   2012-08-12 00:00:00        station1               yyy
2   2012-08-23 00:00:00        station2               aaa
3   2012-08-23 00:00:00        station3               bbb
4   2012-08-25 00:00:00        station4               ccc
5   2012-08-25 00:00:00        station4               ccc
6   2012-08-25 00:00:00        station4               ccc

解决方案

I might index using a boolean array. We want to delete rows (if I understand what you're after, anyway!) which have yyy and more than one dateuse/station combination.

We can use transform to broadcast the size of each dateuse/station combination up to the length of the dataframe, and then select the rows in groups which have length > 1. Then we can & this with where the yyys are.

>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
               dateuse   station variable1
0  2012-08-12 00:00:00  station1       xxx
2  2012-08-23 00:00:00  station2       aaa
3  2012-08-23 00:00:00  station3       bbb
4  2012-08-25 00:00:00  station4       ccc
5  2012-08-25 00:00:00  station4       ccc
6  2012-08-25 00:00:00  station4       ccc

这篇关于Python Pandas 检查一个值是否在同一天出现多次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆