如何从numpy.ndarray中随机选择一些非零元素? [英] How to randomly select some non-zero elements from a numpy.ndarray?

查看:125
本文介绍了如何从numpy.ndarray中随机选择一些非零元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了矩阵分解模型,比如说R = U * V,现在我将训练和测试该模型.

I've implemented a matrix factorization model, say R = U*V, and now I would to train and test this model.

为此,给定稀疏矩阵R(零表示缺失值),我想先在训练中隐藏一些非零元素,然后再将这些非零元素用作测试集.

To this end, given a sparse matrix R (zero for missing value), I want to first hide some non-zero elements in the training and use these non-zero elements as test set later.

如何从numpy.ndarray中随机选择一些非零元素?此外,我需要记住这些选定元素的索引和列位置,以便在测试中使用这些元素.

How can I randomly select some non-zero elements from a numpy.ndarray? Besides, I need to remember the index and column position of these selected elements to use these elements in testing.

例如:

In [2]: import numpy as np

In [4]: mtr = np.random.rand(10,10)

In [5]: mtr
Out[5]: 
array([[ 0.92685787,  0.95496193,  0.76878455,  0.12304856,  0.13804963,
         0.30867502,  0.60245974,  0.00797898,  0.1060602 ,  0.98277982],
       [ 0.88879888,  0.40209901,  0.35274404,  0.73097713,  0.56238248,
         0.380625  ,  0.16432029,  0.5383006 ,  0.0678564 ,  0.42875591],
       [ 0.42343761,  0.31957986,  0.5991212 ,  0.04898903,  0.2908878 ,
         0.13160296,  0.26938537,  0.91442668,  0.72827097,  0.4511198 ],
       [ 0.63979934,  0.33421621,  0.09218392,  0.71520048,  0.57100522,
         0.37205284,  0.59726293,  0.58224992,  0.58690505,  0.4791199 ],
       [ 0.35219557,  0.34954002,  0.93837312,  0.2745864 ,  0.89569075,
         0.81244084,  0.09661341,  0.80673646,  0.83756759,  0.7948081 ],
       [ 0.09173706,  0.86250006,  0.22121994,  0.21097563,  0.55090202,
         0.80954817,  0.97159981,  0.95888693,  0.43151554,  0.2265607 ],
       [ 0.00723128,  0.95690539,  0.94214806,  0.01721733,  0.12552314,
         0.65977765,  0.20845669,  0.44663729,  0.98392716,  0.36258081],
       [ 0.65994805,  0.47697842,  0.35449045,  0.73937445,  0.68578224,
         0.44278095,  0.86743906,  0.5126411 ,  0.75683392,  0.73354572],
       [ 0.4814301 ,  0.92410622,  0.85267402,  0.44856078,  0.03887269,
         0.48868498,  0.83618382,  0.49404473,  0.37328248,  0.18134919],
       [ 0.63999748,  0.48718656,  0.54826717,  0.1001681 ,  0.1940816 ,
         0.3937014 ,  0.48768013,  0.70610649,  0.03213063,  0.88371607]])

In [6]: mtr = np.where(mtr>0.5, 0, mtr)

In [7]: %clear


In [8]: mtr
Out[8]: 
array([[ 0.        ,  0.        ,  0.        ,  0.12304856,  0.13804963,
         0.30867502,  0.        ,  0.00797898,  0.1060602 ,  0.        ],
       [ 0.        ,  0.40209901,  0.35274404,  0.        ,  0.        ,
         0.380625  ,  0.16432029,  0.        ,  0.0678564 ,  0.42875591],
       [ 0.42343761,  0.31957986,  0.        ,  0.04898903,  0.2908878 ,
         0.13160296,  0.26938537,  0.        ,  0.        ,  0.4511198 ],
       [ 0.        ,  0.33421621,  0.09218392,  0.        ,  0.        ,
         0.37205284,  0.        ,  0.        ,  0.        ,  0.4791199 ],
       [ 0.35219557,  0.34954002,  0.        ,  0.2745864 ,  0.        ,
         0.        ,  0.09661341,  0.        ,  0.        ,  0.        ],
       [ 0.09173706,  0.        ,  0.22121994,  0.21097563,  0.        ,
         0.        ,  0.        ,  0.        ,  0.43151554,  0.2265607 ],
       [ 0.00723128,  0.        ,  0.        ,  0.01721733,  0.12552314,
         0.        ,  0.20845669,  0.44663729,  0.        ,  0.36258081],
       [ 0.        ,  0.47697842,  0.35449045,  0.        ,  0.        ,
         0.44278095,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.4814301 ,  0.        ,  0.        ,  0.44856078,  0.03887269,
         0.48868498,  0.        ,  0.49404473,  0.37328248,  0.18134919],
       [ 0.        ,  0.48718656,  0.        ,  0.1001681 ,  0.1940816 ,
         0.3937014 ,  0.48768013,  0.        ,  0.03213063,  0.        ]])

鉴于这种稀疏的ndarray,我如何选择20%的非零元素并记住它们的位置?

Given such sparse ndarray, how can I select 20% of the non-zero elements and remember their position?

推荐答案

我们将使用numpy.random.choice.首先,我们获得(i,j)索引的数组,其中数据为非零:

We'll use numpy.random.choice. First, we get arrays of the (i,j) indices where the data is nonzero:

i,j = np.nonzero(x)

然后,我们将选择其中20%:

Then we'll select 20% of these:

ix = np.random.choice(len(i), int(np.floor(0.2 * len(i))), replace=False)

此处ix是随机唯一索引的列表,其长度为ij的20%(ij的长度为非零条目的数量).要恢复索引,请执行i[ix]j[ix],因此我们可以通过以下操作选择x的非零条目的20%:

Here ix is a list of random, unique indices, 20% the length of i and j (the length of i and j is the number of nonzero entries). To recover the indices, we do i[ix] and j[ix], so we can then select 20% of the nonzero entries of x by writing:

print x[i[ix], j[ix]]

这篇关于如何从numpy.ndarray中随机选择一些非零元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆