Python Pandas 检查一个值是否在同一天出现多次 [英] Python Pandas check if a value occurs more then once in the same day
问题描述
我有一个 Pandas 数据框,如下所示.我想要做的是检查一个站是否在同一天有变量 yyy
和任何其他变量(如 station1
的情况).如果这是真的,我需要删除包含 yyy
的整行.
目前我正在使用 iterrows()
并循环搜索此变量出现的日期,将变量更改为删除我"之类的内容,从中构建一个新的数据框(因为pandas 不支持就地替换) 并过滤新数据框以去除不需要的行.这现在有效,因为我的数据帧很小,但不太可能扩展.
问题:这似乎是一种非常非 Pandas"的方式来执行此操作,是否有其他方法可以删除不需要的变量?
dateuse 站变量10 2012-08-12 00:00:00 站 1 xxx1 2012-08-12 00:00:00 station1 yyy2 2012-08-23 00:00:00 station2 aaa3 2012-08-23 00:00:00 station3 bbb4 2012-08-25 00:00:00 station4 ccc5 2012-08-25 00:00:00 station4 ccc6 2012-08-25 00:00:00 station4 ccc
我可能会使用布尔数组进行索引.我们想删除具有 yyy
和多个 dateuse
/station
组合的行(如果我明白你在追求什么的话!).
我们可以使用transform
将每个dateuse
/station
组合的大小广播到数据帧的长度,然后选择组中长度 > 1 的行.然后我们可以 &
这与 yyy
s 所在的位置.
I have a Pandas dataframe as below. What I am trying to do is check if a station has variable yyy
and any other variable on the same day (as in the case of station1
). If this is true I need to delete the whole row containing yyy
.
Currently I am doing this using iterrows()
and looping to search the days in which this variable appears, changing the variable to something like "delete me", building a new dataframe from this (because pandas doesn't support replacing in place) and filtering the new dataframe to get rid of the unwanted rows. This works now because my dataframes are small, but is not likely to scale.
Question: This seems like a very "non-Pandas" way to do this, is there some other method of deleting out the unwanted variables?
dateuse station variable1
0 2012-08-12 00:00:00 station1 xxx
1 2012-08-12 00:00:00 station1 yyy
2 2012-08-23 00:00:00 station2 aaa
3 2012-08-23 00:00:00 station3 bbb
4 2012-08-25 00:00:00 station4 ccc
5 2012-08-25 00:00:00 station4 ccc
6 2012-08-25 00:00:00 station4 ccc
I might index using a boolean array. We want to delete rows (if I understand what you're after, anyway!) which have yyy
and more than one dateuse
/station
combination.
We can use transform
to broadcast the size of each dateuse
/station
combination up to the length of the dataframe, and then select the rows in groups which have length > 1. Then we can &
this with where the yyy
s are.
>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
dateuse station variable1
0 2012-08-12 00:00:00 station1 xxx
2 2012-08-23 00:00:00 station2 aaa
3 2012-08-23 00:00:00 station3 bbb
4 2012-08-25 00:00:00 station4 ccc
5 2012-08-25 00:00:00 station4 ccc
6 2012-08-25 00:00:00 station4 ccc
这篇关于Python Pandas 检查一个值是否在同一天出现多次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!