非常大和非常稀疏的非负矩阵分解 [英] Very Large and Very Sparse Non Negative Matrix factorization
问题描述
我有一个非常大且稀疏的矩阵(531K x 315K),总细胞数约为1670亿.非零值仅为1s.非零值的总数约为45K.是否有有效的NMF软件包来解决我的问题?我知道有几个软件包,它们仅适用于较小的数据矩阵.任何想法都可以.预先感谢.
I have a very large and also sparse matrix (531K x 315K), the number of total cells is ~167 Billion. The non-zero values are only 1s. Total number of non-zero values are around 45K. Is there an efficient NMF package to solve my problem? I know there are couple of packages for that and they are working well only for small size of data matrix. Any idea helps. Thanks in advance.
推荐答案
X-shape: (531000, 315000) X nnzs: 45000
type(X): <class 'scipy.sparse.csr.csr_matrix'>
violation: 1.0
violation: 0.2318929397542804
violation: 0.11045394409727402
violation: 0.08104138988253409
...
violation: 9.659665625799714e-05
Converged at iteration 71
Used (secs): 247.94092973091756
122.27109041
70
备注:
- 确保您使用稀疏矩阵作为输入,否则您将无法利用稀疏性
- 我正在使用版本 0.19.1 ,因此使用了 multiplicative-update 求解器(> = 0.19)
- 但是较旧的基于CD的求解器也应该处理这个问题!
- Make sure you use sparse-matrices as input or you can't exploit sparsity
- I'm using version 0.19.1, so the multiplicative-update solver is used (>= 0.19)
- But the older CD-based solver should handle this too!
如评论中所述,OP希望添加其他约束,同时仍未正式指定这些约束.
As mentioned in the comments, OP wants to add additional constraints, while still not specifying these formally.
这将需要对优化过程进行全新的实现,包括一些理论上的工作(取决于约束).
This will need a whole new implementation of some optimization-procedure including some theory-footwork (depending on the constraints).
作为替代方案,这可以通过通用凸编程求解器解决.例如.由cvxpy制定并由SCS解决.当然,也需要进行交替最小化过程(因为联合问题是非凸的),并且比这种专门的sklearn实现,其伸缩性更差.但这可能适用于OP数据.
As an alternative, this can be solved by general-purpose Convex-Programming solvers. E.g. formulated by cvxpy and solved by SCS. Of course the alternating-minimization procedure needs to be done too (as the joint-problem is non-convex) and it will scale worse than this specialized sklearn-implementation. But it might work for OPs data.
这篇关于非常大和非常稀疏的非负矩阵分解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
Remarks: