用于CPLEX的矩阵分解和机器学习应用 [英] Decomposition of matrices for CPLEX and machine learning application

查看:241
本文介绍了用于CPLEX的矩阵分解和机器学习应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大型矩阵,并且有时我的代码在我的终端中以"killed:9"消息结尾.我正在Mac OSx上工作.

I am dealing with big matrices and time to time my code ends with 'killed:9' message in my terminal. I'm working on Mac OSx.

一个明智的程序员告诉我,我正在处理的存储矩阵喜欢我代码中的问题.

A wise programmer tells me the problem in my code is liked to the stored matrix I am dealing with.

nn = 35000
dd = 35
XX = np.random.rand(nn,dd)
XX = XX.dot(XX.T)    #it should be faster than np.dot(XX,XX.T)
yy = np.random.rand(nn,1)
XX = np.multiply(XX,yy.T)

我必须存储这个巨大的矩阵XX,我的猜测是:我将矩阵拆分为

I have to store this huge matrix XX, my guess: I split the matrix with

upp = np.triu(XX)

我是否真的在存储数据方面节省了空间? 如果以后在我存储

Do I actually save space in terms of stored data? What if later on I store

low = app.T

我在浪费内存和计算时间吗?

am I wasting memory and computational time?

推荐答案

它应该占用相同的内存总量.为避免该错误,您可能正在考虑以下几种选择:

It should take up the same total amount of memory. To avoid the error you are probably looking at a few options:

  1. 逐批处理 如果您通过CPLEX API创建模型,那么我相信一旦提供了数据,它将由CPLEX处理.因此,您可以拆分数据并逐段加载它,然后将其连续添加到模型中.
  2. 手动分配内存 如果您使用Cython,则可以使用函数 malloc 进行分配手动为阵列存储内存,那么大小很可能就没问题了.
  1. Process batch wise If you create your model over the CPLEX API, once you supplied the data it is handled by CPLEX I believe. So you could split the data and load it piece by piece and add it to the model consecutively.
  2. Allocate memory manually If you use Cython you can use the function malloc to allocate memory manually for your array, the size will very likely be no issue then.

我认为选项1是首选.

我建立了一个小例子.它实际上结合了两个选项.数组不是存储为Python对象,而是存储为C数组,并且值是分段计算的. 我正在使用Cython和malloc为数组分配内存.要运行代码,您必须安装Cython .然后可以在保存文件的目录中打开python解释器,然后编写:

I constructed a little example. It actually combines the two options. The array is not stored as a Python object, but as a C array and the values are computed piecewise. I am allocating the memory for the array using Cython and malloc. To run the code you have to install Cython.Then you can open a python interpreter at the directory you saved the file and write:

import pyximport;pyximport.install()
import nameofscript

处理数组的示例:

import numpy as np
from libc.stdlib cimport malloc # Allocate memory manually
from cython.parallel import prange # Parallel processing without GIL
dd = 35
# With cdef we can define C variables in Cython.
cdef double **XXN
cdef double y[35000]
cdef int i, j, nn
nn = 35000
# Allocate memory for the Matrix with 1.225 billion double elements
XXN = <double **>malloc(nn * sizeof(double *))
for i in range(nn):
    XXN[i] = <double *>malloc(nn * sizeof(double))

XX = np.random.rand(nn,dd)
for i in range(nn):
    for j in range(nn):
        # Compute the values for the new matrix element by element
        XXN[i][j] = XX[i].dot(XX[j].T)

# Multiply the new matrix with y column wise
for i in prange(nn, nogil=True, num_threads=4):
    for j in range(nn):
        XXN[i][j] = XXN[i][j] * y[i]

将此文件另存为nameofscript.pyx并如上所述运行它.我已经对该脚本进行了简短的测试,它在我的计算机上运行了大约半小时.您可以扩展此脚本,并使用结果数组XXN进行进一步的计算. 并行化的一个小例子:我没有初始化y,也没有分配任何值.如果将y声明为C数组,则可以e. G.从python对象分配一些值以将其填充值.然后,您可以以并行方式进行没有GIL的最后一次乘法,如代码示例所示.

Save this file as nameofscript.pyx and run it as described above. I have briefly tested this script and it runs about half an hour on my machine. You can extend this script and use the result array XXN for your further computations. A little example for parallelization: I did not initialize y and did not assign any values. If you declare y as a C array, you can e. g. assign some values from python objects to fill it with values. Then, you can conduct the last multiplication without GIL, in a parallelized manner, as shown in the code sample.

关于计算效率:这可能不是最快的方法(可能完全是为CPLEX C接口编写代码),但是它不会引发内存错误,并且如果没有,它会在可接受的时间内运行重复执行此计算的次数过多.

Regarding computational efficiency: This is probably not the fastest way (which may be writing your code for the CPLEX C Interface entirely maybe), but it does not throw the memory error and does run in an acceptable time if you do not have to repeat this computation too often.

这篇关于用于CPLEX的矩阵分解和机器学习应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆