使用 Pandas 使用特定列的权重对 DataFrame 进行采样 [英] Using Pandas to sample DataFrame using a specific column's weight

查看：69 发布时间：2021/6/13 20:36:53 python pandas dataframe statistics

本文介绍了使用 Pandas 使用特定列的权重对 DataFrame 进行采样的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下所示的 DataFrame:

I have a DataFrame which look like:

  index  name   city
  0      Yam    Hadera
  1      Meow   Hadera
  2      Don    Hadera
  3      Jazz   Hadera
  4      Bond   Tel Aviv
  5      James  Tel Aviv

我希望 Pandas 使用 city 列中出现的次数(使用:df.city.value_counts())随机选择值，所以我的魔法函数的结果，假设:

I want Pandas to randomly choose values, using the number of appearances in the city column (kind of using: df.city.value_counts()), so the results of my magic function, suppose:

df.magic_sample(3, weight_column='city')

可能看起来像:

  0     Yam      Hadera
  1     Meow     Hadera
  2     Bond     Tel Aviv

谢谢！:)

推荐答案

您可以按 city 分组，然后根据与原始数据框长度相比的长度对每个组进行采样:

You can group by city and then sample each group based on their length compared to the length of the original data frame:

df.groupby('city', group_keys=False).apply(lambda g: g.sample(3 * len(g)/len(df)))

这篇关于使用 Pandas 使用特定列的权重对 DataFrame 进行采样的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Pandas 使用特定列的权重对 DataFrame 进行采样 [英] Using Pandas to sample DataFrame using a specific column's weight

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Pandas 使用特定列的权重对 DataFrame 进行采样 [英] Using Pandas to sample DataFrame using a specific column&#39;s weight

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

使用 Pandas 使用特定列的权重对 DataFrame 进行采样 [英] Using Pandas to sample DataFrame using a specific column's weight

登录关闭