pandas 随机加权选择 [英] Pandas Random Weighted Choice
问题描述
我想使用Pandas
考虑权重随机选择一个值.
I would like to randomly select a value in consideration of weightings using Pandas
.
df
:
0 1 2 3 4 5
0 40 5 20 10 35 25
1 24 3 12 6 21 15
2 72 9 36 18 63 45
3 8 1 4 2 7 5
4 16 2 8 4 14 10
5 48 6 24 12 42 30
我知道使用np.random.choice
,例如:
x = np.random.choice(
['0-0','0-1',etc.],
1,
p=[0.4,0.24 etc.]
)
因此,我想以类似于np.random.choice
的样式/替代方法从df
获得输出,但使用的是Pandas
.与如上所述手动插入值相比,我想以一种更有效的方式进行操作.
And so, I would like to get an output, in a similar style/alternative method to np.random.choice
from df
, but using Pandas
. I would like to do so in a more efficient way in comparison to manually inserting the values as I have done above.
使用np.random.choice
我知道所有值必须加起来等于1
.我不确定如何解决此问题,也不确定如何使用Pandas
根据权重随机选择一个值.
Using np.random.choice
I am aware that all values must add up to 1
. I'm not sure as to how to go about solving this, nor randomly selecting a value based on weightings using Pandas
.
在提及输出时,例如,如果随机选择的权重为40,则由于该输出位于该column 0
,row 0
等中,因此输出将为0-0.
When referring to an output, if the randomly selected weight was for example, 40, then the output would be 0-0 since it is located in that column 0
, row 0
and so on.
推荐答案
堆叠DataFrame:
Stack the DataFrame:
stacked = df.stack()
归一化权重(以使权重总计为1):
Normalize the weights (so that they add up to 1):
weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.
然后使用示例:
stacked.sample(1, weights=weights)
Out:
1 2 12
dtype: int64
# Or without normalization, stacked.sample(1, weights=stacked)
DataFrame.sample方法允许您从行或列中进行采样.考虑一下:
DataFrame.sample method allows you to either sample from rows or from columns. Consider this:
df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out:
0 1 2 3 4 5
1 24 3 12 6 21 15
它选择一行(第一行有40%的机会,第二行有30%的机会,等等)
It selects one row (the first row with 40% chance, the second with 30% chance etc.)
这也是可能的:
df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out:
1
0 5
1 3
2 9
3 1
4 2
5 6
相同的过程,但第一列与40%的机会相关,我们正在从列中进行选择.但是,您的问题似乎暗示您不想选择行或列-您想选择其中的单元格.因此,我将尺寸从2D更改为1D.
Same process but 40% chance is associated with the first column and we are selecting from columns. However, your question seems to imply that you don't want to select rows or columns - you want to select the cells inside. Therefore, I changed the dimension from 2D to 1D.
df.stack()
Out:
0 0 40
1 5
2 20
3 10
4 35
5 25
1 0 24
1 3
2 12
3 6
4 21
5 15
2 0 72
1 9
2 36
3 18
4 63
5 45
3 0 8
1 1
2 4
3 2
4 7
5 5
4 0 16
1 2
2 8
3 4
4 14
5 10
5 0 48
1 6
2 24
3 12
4 42
5 30
dtype: int64
因此,如果我现在从中采样,那么我将同时采样一行和一列.例如:
So if I now sample from this, I will both sample a row and a column. For example:
df.stack().sample()
Out:
1 0 24
dtype: int64
选择第1行和第0列.
这篇关于 pandas 随机加权选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!