删除列中“空"超过60%的列 pandas 价值观 [英] Drop Columns with more than 60 Percent of "empty" Values in Pandas

查看:66
本文介绍了删除列中“空"超过60%的列 pandas 价值观的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框:

I have got a dataframe like this:

import pandas as pd
data = {
    'c1': ['Test1','Test2','NULL','Test3',' ','Test4','Test4','Test1',"Test3"],
    'c2': [' ','Test1',' ','NULL',' ','NULL','NULL','NULL','NULL'],
    'c3': [0,0,0,0,0,1,5,0,0],
    'c4': ['NULL', 'Test2', 'Test1','Test1', 'Test2', 'Test2','Test1','Test1','Test2']
}
df = pd.DataFrame(data)
df

数据框如下所示:

    c1      c2      c3      c4
0   Test1           0       NULL
1   Test2   Test1   0       Test2
2   NULL            0       Test1
3   Test3   NULL    0       Test1
4                   0       Test2
5   Test4   NULL    1       Test2
6   Test4   NULL    5       Test1
7   Test1   NULL    0       Test1
8   Test3   NULL    0       Test2

我想删除所有具有超过空"值60%的列.在我的情况下,"Empty"表示值例如:','NULL"或0.有字符串(c1,c2,c4)和整数(c3).

I want to drop all columns, that have more than 60 % of "empty" values. "Empty" means in my case that the values are for example: ' ', 'NULL' or 0. There are strings (c1, c2, c4) as well as integers (c3).

结果应该是仅包含c1和c4列的数据框.

The result should be a dataframe with columns c1 and c4 only.

    c1      c4
0   Test1   NULL
1   Test2   Test2
2   NULL    Test1
3   Test3   Test1
4           Test2
5   Test4   Test2
6   Test4   Test1
7   Test1   Test1
8   Test3   Test2

我不知道该如何解决这个问题.我唯一想到的就是

I have no idea how to handle that problem. Only thing that comes to my mind is something like

df.loc[:, (df != 0).any(axis=0)]

删除所有值均为0,'NULL'等的所有列.

to delete all columns where all values are 0, 'NULL' and so on.

推荐答案

使用 DataFrame.isin 检查所有格式,然后获取mean阈值并通过 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆