如何从数据框中随机删除每个标签中的行? [英] How to remove, randomly, rows from a dataframe but from each label?

查看:109
本文介绍了如何从数据框中随机删除每个标签中的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个机器学习项目.

This is for a machine learning project.

我有一个数据框,其中有5列作为要素,而1列作为标签(图A).

I have a dataframe with 5 columns as features and 1 column as label (Figure A).

我想从每个标签中随机删除2行. 因此,有12行(每个标签4行);我将得到6行(每个标签2行)(图B).

I want to randomly remove 2 rows but from each label. So, as there are 12 rows (4 for each label); I will end up with 6 rows (2 from each label) (Figure B).

我该怎么办?仅使用numpy会更容易吗?

How can I do it? Would it be easier to do it with only numpy?

图A

图B

这是我的代码:

# THIS IS FOR FIGURE A
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(12, 5))

label=np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

df['label'] = label
df.index=['s1', 's1', 's1', 's1', 's2', 's2', 's2', 's2', 's3', 's3', 's3', 's3']
df

#THIS IS MY ATTEMPT FOR FIGURE B
dfs = df.sample(n=2)
dfs

推荐答案

使用groupby.apply:

With groupby.apply:

df.groupby('label', as_index=False).apply(lambda x: x.sample(2)) \
                                   .reset_index(level=0, drop=True)
Out: 
           0         1         2         3         4  label
s1  0.433731  0.886622  0.683993  0.125918  0.398787      1
s1  0.719834  0.435971  0.935742  0.885779  0.460693      1
s2  0.324877  0.962413  0.366274  0.980935  0.487806      2
s2  0.600318  0.633574  0.453003  0.291159  0.223662      2
s3  0.741116  0.167992  0.513374  0.485132  0.550467      3
s3  0.301959  0.843531  0.654343  0.726779  0.594402      3

我认为一种更清晰的理解方式是:

A cleaner way in my opinion would be with a comprehension:

pd.concat(g.sample(2) for idx, g in df.groupby('label'))

这将产生相同的结果:

           0         1         2         3         4  label
s1  0.442293  0.470318  0.559764  0.829743  0.146971      1
s1  0.603235  0.218269  0.516422  0.295342  0.466475      1
s2  0.569428  0.109494  0.035729  0.548579  0.760698      2
s2  0.600318  0.633574  0.453003  0.291159  0.223662      2
s3  0.412750  0.079504  0.433272  0.136108  0.740311      3
s3  0.462627  0.025328  0.245863  0.931857  0.576927      3

这篇关于如何从数据框中随机删除每个标签中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆