如何在python中优化对矩阵的数学运算 [英] How to optimize math operations on matrix in python

查看:90
本文介绍了如何在python中优化对矩阵的数学运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试减少使用两个矩阵执行一系列计算的函数的时间.搜索此消息,我听说过numpy,但我真的不知道如何将其应用于我的问题.另外,我认为使我的函数变慢的原因之一是有许多点运算符(我在

I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I've heard of numpy, but I really do not know how apply it to my problem. Also, I Think one of the things is making my function slow is having many dots operators (I heard of that in this this page ).

数学与二次分配问题的因式分解相对应:

The math correspond with a factorization for the Quadratic assignment problem:

我的代码是:

    delta = 0
    for k in xrange(self._tam):
        if k != r and k != s:
            delta +=
                self._data.stream_matrix[r][k] \
                * (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \
                self._data.stream_matrix[s][k] \
                * (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \
                self._data.stream_matrix[k][r] \
                * (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \
                self._data.stream_matrix[k][s] \
                * (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]])
    return delta

在大小为20(矩阵为20x20)的问题上运行此代码大约需要20段,瓶颈就在此功能中

Running this on a problem of size 20 (Matrix of 20x20) take about 20 segs, the bottleneck is in this function

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
303878   15.712    0.000   15.712    0.000 Heuristic.py:66(deltaC)

我试图将map应用于for循环,但是由于循环体不是函数调用,因此是不可能的.

I tried to apply map to the for loop, but because the loop body isn't a function call, it is not possible.

如何减少时间?

回答艾肯伯格的评论:

sol是一个排列,例如[1,2,3,4].在生成邻居解决方案时会调用该函数,因此[1,2,3,4]的邻居为[2,1,3,4].我仅更改原始置换中的两个位置,然后调用deltaC,该计算将交换位置r,s的解的因式分解(在上面的示例中,r,s = 0,1).进行此排列是为了避免计算邻居解决方案的整个成本.我想我可以将sol[k,r,s]的值存储在局部变量中,以避免在每次迭代中查找它的值. 我不知道这是否是您在评论中要问的内容.

sol is a permutation, for example [1,2,3,4]. the function is called when I am generating neighbor solutions, so, a neighbor of [1,2,3,4] is [2,1,3,4]. I am changing only two positions in the original permutation and then call deltaC, which calculates a factorization of the solution with positions r,s swaped (In the example above r,s = 0,1). This permutation is made to avoid calculate the entire cost of the neighbor solution. I suppose I can store the values of sol[k,r,s] in a local variable to avoid looking up its value in each iteration. I do not know if this is what you was asking in your comment.

一个最小的工作示例:

import random


distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]]
stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]]

def deltaC(r, s, S=None):
    '''
    Difference between C with values i and j swapped
    '''

    S = [0,1,2,3]

    if S is not None:
        sol = S
    else:
        sol = S

    delta = 0

    sol_r, sol_s = sol[r], sol[s]

    for k in xrange(4):
        if k != r and k != s:
            delta += (stream_matrix[r][k] \
                * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
                stream_matrix[s][k] \
                * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
                stream_matrix[k][r] \
                * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
                stream_matrix[k][s] \
                * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]))
    return delta


for _ in xrange(303878):
    d = deltaC(random.randint(0,3), random.randint(0,3))
print d

现在,我认为更好的选择是使用NumPy.我尝试使用Matrix(),但没有提高性能.

Now I think the better option is use NumPy. I tried with Matrix(), but did not improve the performance.

好吧,最后,我可以将@TooTone的解决方案与将索引存储在一个集中以避免if的结合,从而减少了更多时间.时间从大约18秒减少到8秒.这是代码:

Well, Finally I was able to reduce the time a bit more combining @TooTone's solution and storing the indexes in a set to avoid the if. The time has dropped from about 18 seconds to 8 seconds. Here is the code:

def deltaC(self, r, s, sol=None):
    delta = 0
    sol = self.S if sol is None else self.S
    sol_r, sol_s = sol[r], sol[s]

    stream_matrix = self._data.stream_matrix
    distance_matrix = self._data.distance_matrix

    indexes = set(xrange(self._tam)) - set([r, s])

    for k in indexes:
        sol_k = sol[k]
        delta += \
            (stream_matrix[r][k] - stream_matrix[s][k]) \
            * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
            + \
            (stream_matrix[k][r] - stream_matrix[k][s]) \
            * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
    return delta

为了进一步减少时间,我认为最好的方法是编写一个模块.

In order to reduce the time even more, I think the best way would be write a module.

推荐答案

在您给出的简单示例中,使用for k in xrange(4):,循环主体仅执行两次(如果r==s)或三次(如果),而下面的初始numpy实施则要慢很多. Numpy已针对在长向量上执行计算进行了优化,如果向量较短,则开销可能会超过收益. (请注意,在此公式中,矩阵被切成不同的维度,并且不连续地建立索引,这只会使向量化实现变得更加复杂).

In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s), or three times (if r!=s) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).

import numpy as np

distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4

def deltaC_np(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]

    K = np.array([i for i in xrange(n) if i!=r and i!=s])

    return np.sum(
        (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
        *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
        (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
        * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))

在此numpy实现中,不是对K中的元素进行for循环,而是对numpy中K中的所有元素进行了操作.另外,请注意,您的数学表达式可以简化.左括号中的每个术语是右括号中的术语的负数.

In this numpy implementation, rather than a for loop over the elements in K, the operations are applied across all the elements in K within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right.

这也适用于您的原始代码.例如,(self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]])等于(self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]])的-1倍,因此您进行了不必要的计算,并且可以在不使用numpy的情况下优化原始代码.

This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]), so you were doing unnecessary computation, and your original code can be optimized without using numpy.

事实证明,numpy函数的瓶颈是无辜的列表理解力

It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension

K = np.array([i for i in xrange(n) if i!=r and i!=s])

将其替换为矢量化代码

if r==s:
    K=np.arange(n-1)
    K[r:] += 1
else:
    K=np.arange(n-2)
    if r<s:
        K[r:] += 1
        K[s-1:] += 1
    else:
        K[s:] += 1
        K[r-1:] += 1

numpy函数的速度快了许多[em] .

the numpy function is much faster.

运行时间图直接显示在下面(该答案的右下角是优化numpy函数之前的原始图).您会发现,根据矩阵的大小,使用优化的原始代码或numpy代码是有意义的.

A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.

下面提供了完整的代码,以供参考,部分是为了防止其他人进一步使用它. (功能deltaC2是您的原始代码,经过优化,以考虑到可以简化数学表达式的方式.)

The full code is below for reference, partly in case someone else can take it further. (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.)

def deltaC(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]
    for k in xrange(n):
        if k != r and k != s:
            delta += \
                stream_matrix[r][k] \
                * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
                stream_matrix[s][k] \
                * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
                stream_matrix[k][r] \
                * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
                stream_matrix[k][s] \
                * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
    return delta

import numpy as np

def deltaC_np(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]

    if r==s:
        K=np.arange(n-1)
        K[r:] += 1
    else:
        K=np.arange(n-2)
        if r<s:
            K[r:] += 1
            K[s-1:] += 1
        else:
            K[s:] += 1
            K[r-1:] += 1
    #K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW

    return np.sum(
        (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
        *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
        (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
        * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))

def deltaC2(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]
    for k in xrange(n):
        if k != r and k != s:
            sol_k = sol[k]
            delta += \
                (stream_matrix[r][k] - stream_matrix[s][k]) \
                * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
                + \
                (stream_matrix[k][r] - stream_matrix[k][s]) \
                * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
    return delta


import time

N=200

elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
    distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
    stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
    distance_matrix=distance_matrix_np.tolist()
    stream_matrix=stream_matrix_np.tolist()
    sol  = range(n-1,-1,-1)
    sol_np  = np.array(range(n-1,-1,-1))

    Is = np.random.randint(0,n-1,4)
    Js = np.random.randint(0,n-1,4)

    total1 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total1 += deltaC(i,j, sol)
    elapsed1 = (time.clock() - start)
    start = time.clock()

    total2 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total2 += deltaC_np(i,j, sol_np)
    elapsed2 = (time.clock() - start)

    total3 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total3 += deltaC2(i,j, sol_np)
    elapsed3 = (time.clock() - start)

    print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
    elapsed1s.append(elapsed1)
    elapsed2s.append(elapsed2)
    elapsed3s.append(elapsed3)

    #Check errors of one method against another
    #err = 0
    #for i in range(min(n,50)):
    #    for j in range(min(n,50)):
    #        err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
    #print err
import matplotlib.pyplot as plt

plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()

这是优化deltaC_np

这篇关于如何在python中优化对矩阵的数学运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆