pandas :随机删除行,而不会改组数据集 [英] Pandas: Remove rows at random without shuffling dataset

查看:73
本文介绍了 pandas :随机删除行,而不会改组数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,它需要省略几行,同时保留行的顺序.我的想法是使用一个掩码,该掩码的随机数在0和我的数据集的长度之间,但是我不确定如何在不对周围的行进行混排的情况下设置掩码,即类似于对数据集进行采样的方法.

I've got a dataset which needs to omit a few rows whilst preserving the order of the rows. My idea was to use a mask with a random number between 0 and the length of my dataset but I'm not sure how to setup a mask without shuffling the rows around i.e. a method similar to sampling a dataset.

示例:数据集有5行2列,我想随机删除一行.

Example: Dataset has 5 rows and 2 columns and I would like to remove a row at random.

Col1 | Col2
  A  |  1
  B  |  2 
  C  |  5     
  D  |  4
  E  |  0

转换为:

Col1 | Col2
  A  |  1
  B  |  2   
  D  |  4
  E  |  0

,其中第三行(Col1='C')被随机选择省略.

with the third row (Col1='C') omitted by a random choice.

我应该怎么做?

推荐答案

以下内容应为您工作.在这里,我从df的索引中采样了remove_n随机row_id.之后,df.drop从数据帧中删除那些行,并返回旧数据帧的新子集.

The following should work for you. Here I sample remove_n random row_ids from df's index. After that df.drop removes those rows from the data frame and returns the new subset of the old data frame.

import pandas as pd
import numpy as np
np.random.seed(10)

remove_n = 1
df = pd.DataFrame({"a":[1,2,3,4], "b":[5,6,7,8]})
drop_indices = np.random.choice(df.index, remove_n, replace=False)
df_subset = df.drop(drop_indices)

DataFrame df:

DataFrame df:

    a   b
0   1   5
1   2   6
2   3   7
3   4   8

DataFrame df_subset:

DataFrame df_subset:

    a   b
0   1   5
1   2   6
3   4   8

这篇关于 pandas :随机删除行,而不会改组数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆