当列数是动态的时,如何从数据帧中过滤数据? [英] How to filter data from a data frame when the number of columns are dynamic?
问题描述
我有一个如下所示的数据框
I have a data frame like below
A_Name B_Detail Value_B Value_C Value_D ......
0 AA X1 1.2 0.5 -1.3 ......
1 BB Y1 0.76 -0.7 0.8 ......
2 CC Z1 0.7 -1.3 2.5 ......
3 DD L1 0.9 -0.5 0.4 ......
4 EE M1 1.3 1.8 -1.3 ......
5 FF N1 0.7 -0.8 0.9 ......
6 GG K1 -2.4 -1.9 2.1 ......
这只是数据帧的一个示例,我可以有n个列,例如(Value_A,Value_B,Value_C,........... Value_N)
This is just a sample of data frame, I can have n number of columns like (Value_A, Value_B, Value_C, ........... Value_N)
现在我想过滤所有列(Value_A,Value_B,Value_C,....)的绝对值小于1的所有行.
Now i want to filter all rows where absolute value of all columns (Value_A, Value_B, Value_C, ....) is less than 1.
如果列数有限,则可以通过简单地在数据帧中的列上放置"and"条件来过滤数据,但是在这种情况下我不知道该怎么做.
If you have limited number of columns, you can filter the data by simply putting 'and' condition on columns in dataframe, but I am not able to figure out what to do in this case.
我不知道这些列的数量是多少,我唯一知道的是此类列将以'Value'作为前缀.
I don't know what would be number of such columns, the only thing I know that such columns would be prefixed with 'Value'.
在上述情况下,输出应类似于
In above case output should be like
A_Name B_Detail Value_B Value_C Value_D ......
1 BB Y1 0.76 -0.7 0.8 ......
3 DD L1 0.9 -0.5 0.4 ......
5 FF N1 0.7 -0.8 0.9 ......
推荐答案
使用 filter
与 abs
和 all
创建mask
,然后创建 boolean indexing
:
Use filter
with abs
and all
for creating mask
and then boolean indexing
:
mask = (df.filter(like='Value').abs() < 1).all(axis=1)
print (mask)
0 False
1 True
2 False
3 True
4 False
5 True
6 False
dtype: bool
print (df[mask])
A_Name B_Detail Value_B Value_C Value_D
1 BB Y1 0.76 -0.7 0.8
3 DD L1 0.90 -0.5 0.4
5 FF N1 0.70 -0.8 0.9
时间中的所有组合:
#len df = 70k, 5 columns
df = pd.concat([df]*10000).reset_index(drop=True)
In [47]: %timeit (df[(df.filter(like='Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.48 ms per loop
In [48]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop
In [49]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop
In [50]: %timeit (df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.3 ms per loop
#len df = 70k, 5k columns
df = pd.concat([df]*10000).reset_index(drop=True)
df = pd.concat([df]*1000, axis=1)
#only for testing, create unique columns names
df.columns = df.columns.str[:-1] + [str(col) for col in list(range(df.shape[1]))]
print (df)
In [75]: %timeit ((df[(df.filter(like='Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop
In [76]: %timeit ((df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop
In [77]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.4 s per loop
In [78]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.1 s per loop
这篇关于当列数是动态的时,如何从数据帧中过滤数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!