Python中的矩阵完成 [英] Matrix completion in Python

查看:113
本文介绍了Python中的矩阵完成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个矩阵:

> import numpy as nap
> a = np.random.random((5,5))

array([[ 0.28164485,  0.76200749,  0.59324211,  0.15201506,  0.74084168],
       [ 0.83572213,  0.63735993,  0.28039542,  0.19191284,  0.48419414],
       [ 0.99967476,  0.8029097 ,  0.53140614,  0.24026153,  0.94805153],
       [ 0.92478   ,  0.43488547,  0.76320656,  0.39969956,  0.46490674],
       [ 0.83315135,  0.94781119,  0.80455425,  0.46291229,  0.70498372]])

然后用np.NaN在其中打一些孔,例如:

And that I punch some holes in it with np.NaN, e.g.:

> a[(1,4,0,3),(2,4,2,0)] = np.NaN; 

array([[ 0.80327707,  0.87722234,         nan,  0.94463778,  0.78089194],
       [ 0.90584284,  0.18348667,         nan,  0.82401826,  0.42947815],
       [ 0.05913957,  0.15512961,  0.08328608,  0.97636309,  0.84573433],
       [        nan,  0.30120861,  0.46829231,  0.52358888,  0.89510461],
       [ 0.19877877,  0.99423591,  0.17236892,  0.88059185,        nan ]])

我想使用来自矩阵其余条目的信息来填充nan条目.例如,使用出现nan条目的列的平均值值.

I would like to fill-in the nan entries using information from the rest of entries of the matrix. An example would be using the average value of the column where the nan entries occur.

更一般而言,Python中是否有任何库可用于矩阵完成? (例如,类似于 Candes&ht的凸优化方法 ).

More generally, are there any libraries in Python for matrix completion ? (e.g. something along the lines of Candes & Recht's convex optimization method).

这个问题经常出现在机器学习中.例如,在分类/回归或 协作过滤中使用缺失功能 (例如,在 Wikipedia 此处)

This problem appears often in machine learning. For example when working with missing features in classification/regression or in collaborative filtering (e.g. see the Netflix Problem on Wikipedia and here)

推荐答案

如果安装最新的scikit-learn版本0.14a1,则可以使用其闪亮的新Imputer类:

If you install the latest scikit-learn, version 0.14a1, you can use its shiny new Imputer class:

>>> from sklearn.preprocessing import Imputer
>>> imp = Imputer(strategy="mean")
>>> a = np.random.random((5,5))
>>> a[(1,4,0,3),(2,4,2,0)] = np.nan
>>> a
array([[ 0.77473361,  0.62987193,         nan,  0.11367791,  0.17633671],
       [ 0.68555944,  0.54680378,         nan,  0.64186838,  0.15563309],
       [ 0.37784422,  0.59678177,  0.08103329,  0.60760487,  0.65288022],
       [        nan,  0.54097945,  0.30680838,  0.82303869,  0.22784574],
       [ 0.21223024,  0.06426663,  0.34254093,  0.22115931,         nan]])
>>> a = imp.fit_transform(a)
>>> a
array([[ 0.77473361,  0.62987193,  0.24346087,  0.11367791,  0.17633671],
       [ 0.68555944,  0.54680378,  0.24346087,  0.64186838,  0.15563309],
       [ 0.37784422,  0.59678177,  0.08103329,  0.60760487,  0.65288022],
       [ 0.51259188,  0.54097945,  0.30680838,  0.82303869,  0.22784574],
       [ 0.21223024,  0.06426663,  0.34254093,  0.22115931,  0.30317394]])

此后,您可以使用imp.transform对其他数据进行相同的转换,这是从a那里学习到的imp的意思. Imputer绑定到scikit-learn Pipeline对象中,因此您可以在分类或回归管道中使用它们.

After this, you can use imp.transform to do the same transformation to other data, using the mean that imp learned from a. Imputers tie into scikit-learn Pipeline objects so you can use them in classification or regression pipelines.

如果您要等待稳定的发布,那么下周应该发布0.14.

If you want to wait for a stable release, then 0.14 should be out next week.

完全公开:我是scikit-learn核心开发人员

Full disclosure: I'm a scikit-learn core developer

这篇关于Python中的矩阵完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆