pandas 中数据框的子集的随机样本 [英] Random Sample of a subset of a dataframe in Pandas
问题描述
说我有一个具有100,000个条目的数据框,并想将其拆分为1000个条目的100个部分.
Say i have a dataframe with 100,000 entries and want to split it into 100 sections of 1000 entries.
我如何从100个部分之一中随机抽取大小为50的样本.数据集已经排序,因此前1000个结果是第一部分,下一部分是下一部分,依此类推.
How do i take a random sample of say size 50 of just one of the 100 sections. the data set is already ordered such that the first 1000 results are the first section the next section the next and so on.
非常感谢
推荐答案
一种解决方案是使用numpy中的choice
函数.
One solution is to use the choice
function from numpy.
假设您想要100个条目中有50个条目,可以使用:
Say you want 50 entries out of 100, you can use:
import numpy as np
chosen_idx = np.random.choice(1000, replace=False, size=50)
df_trimmed = df.iloc[chosen_idx]
这当然不考虑您的块结构.例如,如果要从块i
中获取50个项目的样本,则可以执行以下操作:
This is of course not considering your block structure. If you want a 50 item sample from block i
for example, you can do:
import numpy as np
block_start_idx = 1000 * i
chosen_idx = np.random.choice(1000, replace=False, size=50)
df_trimmed_from_block_i = df.iloc[block_start_idx + chosen_idx]
这篇关于 pandas 中数据框的子集的随机样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!