根据列值获取一部分数据框 [英] Taking a proportion of a dataframe based on column values
问题描述
我有一个大约有50,000行的Pandas数据框,我想根据多种条件从该数据框中随机选择一定比例的行.具体来说,我有一列称为使用类型",对于该列中的每个字段,我想选择不同比例的行.
I have a Pandas dataframe with ~50,000 rows and I want to randomly select a proportion of rows from that dataframe based on a number of conditions. Specifically, I have a column called 'type of use' and, for each field in that column, I want to select a different proportion of rows.
例如:
df[df['type of use'] == 'housing'].sample(frac=0.2)
此代码返回所有以房屋"作为使用类型"的行的20%.问题是我不知道如何以惯用的"方式对其余字段执行此操作.我也不知道如何从这次采样中得到结果来形成一个新的数据框.
This code returns 20% of all the rows which have 'housing' as their 'type of use'. The problem is I do not know how to do this for the remaining fields in a way that is 'idiomatic'. I also do not know how I could take the result from this sampling to form a new dataframe.
推荐答案
您可以通过list(df['type of use'].unique())
为列中的所有值创建唯一列表,并按如下所示进行迭代:
You can make a unique list for all the values in the column by list(df['type of use'].unique())
and iterate like below:
for i in list(df['type of use'].unique()):
print(df[df['type of use'] == i].sample(frac=0.2))
或
i = 0
while i < len(list(df['type of use'].unique())):
df1 = df[(df['type of use']==list(df['type of use'].unique())[i])].sample(frac=0.2)
print(df1.head())
i = i + 1
要存储,您可以创建字典:
For storing you can create a dictionary:
dfs = ['df' + str(x) for x in list(df2['type of use'].unique())]
dicdf = dict()
i = 0
while i < len(dfs):
dicdf[dfs[i]] = df[(df['type of use']==list(df2['type of use'].unique())[i])].sample(frac=0.2)
i = i + 1
print(dicdf)
这将打印数据帧的字典.
您可以打印想要查看的内容,例如住房样本:print (dicdf['dfhousing'])
This will print a dictionary of the dataframes.
You can print what you like to see for example for housing sample : print (dicdf['dfhousing'])
这篇关于根据列值获取一部分数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!