在python pandas中按包含零的行拆分DataFrame [英] Split a DataFrame by rows containing zero in python pandas
问题描述
如果我问的是非常基本的问题并且在其他地方得到了回答(找不到,但可能是我使用了错误的术语),我们深表歉意.
Apologies if what I'm asking is very basic and has been answered elsewhere (couldn't find it, but it could be I'm just using the wrong terminology).
我希望能够在列中出现一定数量的连续 0 时拆分 DataFrame.假设我有这个 DataFrame:
I would like to be able to split a DataFrame when a certain amount of successive 0s appears in a column. Let's say I have this DataFrame:
import pandas as pd
import datetime
idx = pd.date_range('2020-01-01', periods=6, freq='D')
df = pd.DataFrame({'A': range(6), 'B': [3, 2, 0, 0, 1, 2]}, index=pd.date_range('2020-01-01', periods=6, freq='D'))
A B
2020-01-01 0 3
2020-01-02 1 2
2020-01-03 2 0
2020-01-04 3 0
2020-01-05 4 1
2020-01-06 5 2
我想得到的是两个形状如下的 DataFrame:
What I'd like to arrive at is two DataFrames that are shaped like this:
A B
2020-01-01 0 3
2020-01-02 1 2
A B
2020-01-05 4 1
2020-01-06 5 2
我怀疑它可以用 groupby
和可能的 lambda
(?) 来完成,但我没有任何运气尝试...
I suspect it can be done with groupby
and possibly a lambda
(?) but I didn't have any luck trying...
推荐答案
这里有一个不太优雅的解决方案,但它会让你进入你需要的 groupby
:)
Here is a not-so-elegant solution, which would however get you to the groupby
you need :)
df2 = df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))
df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()
df2[df2['B'].notnull()].groupby('group')
如果您检查 df2
(我正在创建一个新的,以防万一您想要两个不同的,但如果需要,您可以链接操作),现在看起来像这样:
If you inspect df2
(I'm creating a new one just in case you want to have two different ones, but you can perhaps chain the operation if need be), it looks like this now:
A B group
2020-01-01 0.0 3.0 1
2020-01-02 1.0 2.0 1
2020-01-03 NaN NaN 1
2020-01-04 NaN NaN 1
2020-01-05 4.0 1.0 2
2020-01-06 5.0 2.0 2
所以,现在您可以过滤掉 df['B'] 为 null 的值
(本质上是在一行中出现两个连续 0 的行),然后分组这个新列 <代码>组代码>.
So, now you can filter out values where df['B'] is null
(which is essentially the rows where two consecutive 0s appeared in a row), and then groupby this new column group
.
这里发生的是:
df.mask((df['B'] == 0) & ((df['B'].shift(1) == 0) | (df['B'].shift(-1) == 0)))
如果 B 值等于 0 并且前一个或下一个也等于 0,则隐藏这些行(通过 df.mask()
替换为 NaN)
If the B value is equal to 0 and either the previous or next one are also equal to zero, hide these rows (replace with NaN via df.mask()
)
df2['group'] = (df2['B'].shift(1).isnull() & df2['B'].notnull()).cumsum()
创建一个指标列group
,只是为了让Pandas知道groupby
要做什么(你也可以直接按整个表达式进行分组,我只是想做这一步清除).该组的定义如下:如果 B 的先前值为 Null,并且当前值不为 null,则定义一个新组.然后取累积和,这样你就可以得到这个捏造的id"分组.
Create an indicator column group
, just to let Pandas know what to groupby
(you can also just directly group by that whole expression, I just want to make the step clear). The group is defined as follows: a new group is defined if the previous value of B is Null, and if the current value is not null. Then take the cumulative sum, and this way you get this fabricated "id" to groupby.
这篇关于在python pandas中按包含零的行拆分DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!