当列数是动态的时,如何从数据帧中过滤数据? [英] How to filter data from a data frame when the number of columns are dynamic?

查看:49
本文介绍了当列数是动态的时,如何从数据帧中过滤数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a data frame like below

    A_Name  B_Detail  Value_B  Value_C   Value_D ......
0   AA      X1        1.2      0.5       -1.3    ......
1   BB      Y1        0.76     -0.7      0.8     ......
2   CC      Z1        0.7      -1.3      2.5     ......
3   DD      L1        0.9      -0.5      0.4     ......
4   EE      M1        1.3      1.8       -1.3    ......
5   FF      N1        0.7      -0.8      0.9     ......
6   GG      K1        -2.4     -1.9      2.1     ......

这只是数据帧的一个示例,我可以有n个列,例如(Value_A,Value_B,Value_C,........... Value_N)

This is just a sample of data frame, I can have n number of columns like (Value_A, Value_B, Value_C, ........... Value_N)

现在我想过滤所有列(Value_A,Value_B,Value_C,....)的绝对值小于1的所有行.

Now i want to filter all rows where absolute value of all columns (Value_A, Value_B, Value_C, ....) is less than 1.

如果列数有限,则可以通过简单地在数据帧中的列上放置"and"条件来过滤数据,但是在这种情况下我不知道该怎么做.

If you have limited number of columns, you can filter the data by simply putting 'and' condition on columns in dataframe, but I am not able to figure out what to do in this case.

我不知道这些列的数量是多少,我唯一知道的是此类列将以'Value'作为前缀.

I don't know what would be number of such columns, the only thing I know that such columns would be prefixed with 'Value'.

在上述情况下,输出应类似于

In above case output should be like

    A_Name  B_Detail  Value_B  Value_C   Value_D ......
1   BB      Y1        0.76     -0.7      0.8     ......
3   DD      L1        0.9      -0.5      0.4     ......
5   FF      N1        0.7      -0.8      0.9     ......

推荐答案

使用 filter abs all 创建mask,然后创建 boolean indexing :

Use filter with abs and all for creating mask and then boolean indexing:

mask = (df.filter(like='Value').abs() < 1).all(axis=1)
print (mask)
0    False
1     True
2    False
3     True
4    False
5     True
6    False
dtype: bool

print (df[mask])
  A_Name B_Detail  Value_B  Value_C  Value_D
1     BB       Y1     0.76     -0.7      0.8
3     DD       L1     0.90     -0.5      0.4
5     FF       N1     0.70     -0.8      0.9

时间中的所有组合:

#len df = 70k, 5 columns
df = pd.concat([df]*10000).reset_index(drop=True)

In [47]: %timeit (df[(df.filter(like='Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.48 ms per loop

In [48]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop

In [49]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
100 loops, best of 3: 7.02 ms per loop

In [50]: %timeit (df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)])
100 loops, best of 3: 7.3 ms per loop


#len df = 70k, 5k columns
df = pd.concat([df]*10000).reset_index(drop=True)
df = pd.concat([df]*1000, axis=1)
#only for testing, create unique columns names
df.columns = df.columns.str[:-1] + [str(col) for col in list(range(df.shape[1]))]
print (df)

In [75]: %timeit ((df[(df.filter(like='Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop

In [76]: %timeit ((df[(df.filter(regex=r'Value').abs() < 1).all(axis=1)]))
1 loop, best of 3: 10.3 s per loop

In [77]: %timeit (df[df.filter(regex=r'Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.4 s per loop

In [78]: %timeit (df[df.filter(like='Value').abs().lt(1).all(1)])
1 loop, best of 3: 10.1 s per loop

这篇关于当列数是动态的时,如何从数据帧中过滤数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆