“丢弃随机行"从 pandas 数据框 [英] "Drop random rows" from pandas dataframe
问题描述
在熊猫数据框中,如何删除服从条件的行的随机子集?
In a pandas dataframe, how can I drop a random subset of rows that obey a condition?
换句话说,如果我有一个带有 Label
列的Pandas数据框,我想删除 Label == 1
,但保留其余所有内容:
In other words, if I have a Pandas dataframe with a Label
column, I'd like to drop 50% (or some other percentage) of rows where Label == 1
, but keep all of the rest:
Label A -> Label A
0 1 0 1
0 2 0 2
0 3 0 3
1 10 1 11
1 11 1 12
1 12
1 13
我很想知道这样做的最简单,最pythonic/panda方式!
I'd love to know the simplest and most pythonic/panda-ish way of doing this!
This question provides part of an answer, but it only talks about dropping rows by index, disregarding the row values. I'd still like to know how to drop only from rows that are labeled a certain way.
推荐答案
使用 frac
参数
df.sample(frac=.5)
如果您定义要放入变量 n
If you define the amount you want to drop in a variable n
n = .5
df.sample(frac=1 - n)
要包含条件,请使用 drop
df.drop(df.query('Label == 1').sample(frac=.5).index)
Label A
0 0 1
1 0 2
2 0 3
4 1 11
6 1 13
这篇关于“丢弃随机行"从 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!