根据列值获取一部分数据框 [英] Taking a proportion of a dataframe based on column values

查看:64
本文介绍了根据列值获取一部分数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约有50,000行的Pandas数据框,我想根据多种条件从该数据框中随机选择一定比例的行.具体来说,我有一列称为使用类型",对于该列中的每个字段,我想选择不同比例的行.

I have a Pandas dataframe with ~50,000 rows and I want to randomly select a proportion of rows from that dataframe based on a number of conditions. Specifically, I have a column called 'type of use' and, for each field in that column, I want to select a different proportion of rows.

例如:

df[df['type of use'] == 'housing'].sample(frac=0.2)

此代码返回所有以房屋"作为使用类型"的行的20%.问题是我不知道如何以惯用的"方式对其余字段执行此操作.我也不知道如何从这次采样中得到结果来形成一个新的数据框.

This code returns 20% of all the rows which have 'housing' as their 'type of use'. The problem is I do not know how to do this for the remaining fields in a way that is 'idiomatic'. I also do not know how I could take the result from this sampling to form a new dataframe.

推荐答案

您可以通过list(df['type of use'].unique())为列中的所有值创建唯一列表,并按如下所示进行迭代:

You can make a unique list for all the values in the column by list(df['type of use'].unique()) and iterate like below:

for i in list(df['type of use'].unique()):
    print(df[df['type of use'] == i].sample(frac=0.2))

i = 0 
while i < len(list(df['type of use'].unique())):
    df1 = df[(df['type of use']==list(df['type of use'].unique())[i])].sample(frac=0.2)
    print(df1.head())
    i = i + 1

要存储,您可以创建字典:

For storing you can create a dictionary:

dfs = ['df' + str(x) for x in list(df2['type of use'].unique())]
dicdf = dict()
i = 0 
while i < len(dfs):
    dicdf[dfs[i]] = df[(df['type of use']==list(df2['type of use'].unique())[i])].sample(frac=0.2)
    i = i + 1
print(dicdf)

这将打印数据帧的字典. 您可以打印想要查看的内容,例如住房样本:print (dicdf['dfhousing'])

This will print a dictionary of the dataframes. You can print what you like to see for example for housing sample : print (dicdf['dfhousing'])

这篇关于根据列值获取一部分数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆