pandas 随机加权选择 [英] Pandas Random Weighted Choice

查看：77 发布时间：2020/5/18 21:07:27 python python-2.7 pandas numpy

本文介绍了 pandas 随机加权选择的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用Pandas考虑权重随机选择一个值.

I would like to randomly select a value in consideration of weightings using Pandas.

df:

   0  1  2  3  4  5
0  40  5 20 10 35 25
1  24  3 12  6 21 15
2  72  9 36 18 63 45
3  8   1  4  2  7 5
4  16  2  8  4 14 10
5  48  6 24 12 42 30

我知道使用np.random.choice，例如:

x = np.random.choice(
  ['0-0','0-1',etc.], 
  1,
  p=[0.4,0.24 etc.]
)

因此，我想以类似于np.random.choice的样式/替代方法从df获得输出，但使用的是Pandas.与如上所述手动插入值相比，我想以一种更有效的方式进行操作.

And so, I would like to get an output, in a similar style/alternative method to np.random.choice from df, but using Pandas. I would like to do so in a more efficient way in comparison to manually inserting the values as I have done above.

使用np.random.choice我知道所有值必须加起来等于1.我不确定如何解决此问题，也不确定如何使用Pandas根据权重随机选择一个值.

Using np.random.choice I am aware that all values must add up to 1. I'm not sure as to how to go about solving this, nor randomly selecting a value based on weightings using Pandas.

在提及输出时，例如，如果随机选择的权重为40，则由于该输出位于该column 0，row 0等中，因此输出将为0-0.

When referring to an output, if the randomly selected weight was for example, 40, then the output would be 0-0 since it is located in that column 0, row 0 and so on.

推荐答案

堆叠DataFrame:

Stack the DataFrame:

stacked = df.stack()

归一化权重(以使权重总计为1):

Normalize the weights (so that they add up to 1):

weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.

然后使用示例:

stacked.sample(1, weights=weights)
Out: 
1  2    12
dtype: int64

# Or without normalization, stacked.sample(1, weights=stacked)

DataFrame.sample方法允许您从行或列中进行采样.考虑一下:

DataFrame.sample method allows you to either sample from rows or from columns. Consider this:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out: 
    0  1   2  3   4   5
1  24  3  12  6  21  15

它选择一行(第一行有40％的机会，第二行有30％的机会，等等)

It selects one row (the first row with 40% chance, the second with 30% chance etc.)

这也是可能的:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out: 
   1
0  5
1  3
2  9
3  1
4  2
5  6

相同的过程，但第一列与40％的机会相关，我们正在从列中进行选择.但是，您的问题似乎暗示您不想选择行或列-您想选择其中的单元格.因此，我将尺寸从2D更改为1D.

Same process but 40% chance is associated with the first column and we are selecting from columns. However, your question seems to imply that you don't want to select rows or columns - you want to select the cells inside. Therefore, I changed the dimension from 2D to 1D.

df.stack()

Out: 
0  0    40
   1     5
   2    20
   3    10
   4    35
   5    25
1  0    24
   1     3
   2    12
   3     6
   4    21
   5    15
2  0    72
   1     9
   2    36
   3    18
   4    63
   5    45
3  0     8
   1     1
   2     4
   3     2
   4     7
   5     5
4  0    16
   1     2
   2     8
   3     4
   4    14
   5    10
5  0    48
   1     6
   2    24
   3    12
   4    42
   5    30
dtype: int64

因此，如果我现在从中采样，那么我将同时采样一行和一列.例如:

So if I now sample from this, I will both sample a row and a column. For example:

df.stack().sample()
Out: 
1  0    24
dtype: int64

选择第1行和第0列.

这篇关于 pandas 随机加权选择的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 随机加权选择 [英] Pandas Random Weighted Choice

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 随机加权选择 [英] Pandas Random Weighted Choice

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭