如何根据与序列相关的约束过滤行? [英] How to filter rows based on the sequence-related constraint?
问题描述
我有以下数据框:
df =
ID TYPE VD_0 VD_1 VD_2 VD_3
1 ABC V1234 456 123 564
2 DBC 456 A45 123 564
3 ABD 456 V1234 456 123
4 ABD 123 V1234 SSW 123
以下是VD_0
,VD_1
,VD_2
和VD_3
的值的列表:
There is the following list of values of VD_0
, VD_1
, VD_2
and VD_3
:
myList = [V1234,456,A45]
我只想获取df
中的那些行,这些行在VD_0
,VD_1
,VD_2
和VD_3
列中的myList
中具有2个连续出现的值.
I want to get only those rows in df
that have 2 sequencial occurances of values from myList
in columns VD_0
, VD_1
, VD_2
and VD_3
.
结果是这样的:
result =
ID TYPE VD_0 VD_1 VD_2 VD_3
1 ABC V1234 456 123 564
2 DBC 456 A45 123 564
3 ABD 456 V1234 456 123
例如,在具有ID
1的行中,VD_0
和VD_1
的值分别等于V1234
和456
,并且这两个值都属于myList
.相同的逻辑适用于具有ID
2(456
,A45
)和3(456
,V1234
)的行.
For example, in row with ID
1 the values of VD_0
and VD_1
are equal to V1234
and 456
, correspondingly, and both of these values belong to myList
. The same logic is applied to rows with ID
2 (456
,A45
) and 3 (456
,V1234
).
我该怎么办?
推荐答案
我同意MaxU答案的开头,但是结尾应该更容易IIUC.您想要的过滤器应从列表中获得2个连续匹配项.您可以说,如果您将isin结果的逐行总和至少为2,则将它们乘以2和2的总和就可以得到这个答案.这称为沿轴= 1的2周期滚动窗口总和.然后,您取每一行的最大值,并且匹配项的值大于或等于2:
I agree with the beginning of MaxU's answer, yet, the end should be easier IIUC. The filter you want should get 2 consecutive matches from your list. You can get this answer by saying you want the row by row sum of isin result being at least a value of 2 if you sum them two by two. This is called a 2-period rolling window sum along axis=1. Then you take the max value of each row and the matches have a value greater or equal then 2:
subset = df.filter(like='VD_')
df[subset.isin(myList).rolling(2, axis=1).sum().max(axis=1)>=2]
Out[26]:
ID TYPE VD_0 VD_1 VD_2 VD_3
0 1 ABC V1234 456 123 564
1 2 DBC 456 A45 123 564
2 3 ABD 456 V1234 456 123
这篇关于如何根据与序列相关的约束过滤行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!