Pandas：按两列分组并随机选择组，这样第一列中的每个值都将由单个组表示 [英] pandas: groupby two columns and get random selection of groups such that each value in the first column will be represented by a single group

查看：21 发布时间：2022/2/26 21:22:39 python pandas pandas-groupby

本文介绍了Pandas：按两列分组并随机选择组，这样第一列中的每个值都将由单个组表示的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

与this question类似，但复杂度有所增加。
在我的示例中，我有以下数据帧：

import pandas as pd    
df = pd.DataFrame({'col1': list('aaabbbabababbaaa'), 'col2': list('cdddccdsssssddcd'), 'val': range(0, 16)})

输出：

   col1 col2  val
0     a    c    0
1     a    d    1
2     a    d    2
3     b    d    3
4     b    c    4
5     b    c    5
6     a    d    6
7     b    s    7
8     a    s    8
9     b    s    9
10    a    s   10
11    b    s   11
12    b    d   12
13    a    d   13
14    a    c   14
15    a    d   15

我的目标是选择groupby(['col1', 'col2'])的随机组，以便col1的每个值只被选择一次。这可以通过以下代码执行：

g = df.groupby('col1')
indexes = []
for _, group in g:
    g_ = group.groupby('col2')
    a = np.arange(g_.ngroups)
    np.random.shuffle(a)
    indexes.extend(group[g_.ngroup().isin(a[:1])].index.tolist())

输出：

print(df[df.index.isin(indexes)])
   col1 col2  val
4     b    c    4
5     b    c    5
8     a    s    8
10    a    s   10

但是，我正在寻找一种更简明、更有效的方法来解决此问题。

col1

另一个选项是用sample和drop_duplicates为您的两列加上推荐答案后缀，这样您就可以在每个col1值中只保留一对。然后merge将结果传递给df以选择具有这些对的所有行。

print(df.merge(df[['col1','col2']].sample(frac=1).drop_duplicates('col1')))
  col1 col2  val
0    b    s    7
1    b    s    9
2    b    s   11
3    a    s    8
4    a    s   10

或与groupby和sample的概念略有相同，但在

之后使用merge只选择每列1值一行

df.merge(df[['col1','col2']].groupby('col1').sample(n=1))

编辑：要同时获取选定行和其他行，则可以在合并中使用参数指示器并进行左合并。然后query分别：

m = df.merge(df[['col1','col2']].groupby('col1').sample(1), how='left', indicator=True)
print(m)
select_ = m.query('_merge=="both"')[df.columns]
print(select_)
comp_ = m.query('_merge=="left_only"')[df.columns]
print(comp_)

这篇关于Pandas：按两列分组并随机选择组，这样第一列中的每个值都将由单个组表示的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas：按两列分组并随机选择组，这样第一列中的每个值都将由单个组表示 [英] pandas: groupby two columns and get random selection of groups such that each value in the first column will be represented by a single group

问题描述

col1

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas：按两列分组并随机选择组，这样第一列中的每个值都将由单个组表示 [英] pandas: groupby two columns and get random selection of groups such that each value in the first column will be represented by a single group

问题描述

col1

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭