Python与MATLAB在算法上的性能 [英] Python vs MATLAB performance on algorithm

查看:86
本文介绍了Python与MATLAB在算法上的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对两个代码有性能问题.一种在python中实现,另一种在MATLAB中实现.该代码计算时间序列的样本熵(听起来很复杂,但基本上是一堆for循环).

I have a performance question about two bits of code. One is implemented in python and one in MATLAB. The code calculates the sample entropy of a time series (which sounds complicated but is basically a bunch of for loops).

根据时间序列,我正在相对较大的时间序列(约95k +样本)上运行这两种实现. MATLAB实现在约45秒至1分钟内完成了计算.蟒蛇基本上永远不会结束.我将tqdm投掷到python的for循环上,上层循环仅以约〜1.85s/它的速度移动,这提供了50多个小时的估计完成时间(我已经让它运行了15分钟以上,并且迭代计数相当一致) ).

I am running both implementations on relatively large time series (~95k+ samples) depending on the time series. The MATLAB implementation finishes the calculation in ~45 sec to 1 min. The python one basically never finishes. I threw tqdm over the python for loops and the upper loop was only moving at about ~1.85s/it which gives 50+ hours as a estimated completion time (I've let it run for 15+ mins and the iteration count was pretty consistent).

示例输入和运行时:

MATLAB(〜52秒):

MATLAB (~ 52 sec):

a = rand(1, 95000)
sampenc(a, 4, 0.1 * std(a))

Python(目前需要5分钟,估计需要49小时):

Python (currently 5 mins in with 49 hours estimated):

import numpy as np
a = np.random.rand(1, 95000)[0]
sample_entropy(a, 4, 0.1 * np.std(a))

Python实现:

# https://github.com/nikdon/pyEntropy
def sample_entropy(time_series, sample_length, tolerance=None):
    """Calculate and return Sample Entropy of the given time series.
    Distance between two vectors defined as Euclidean distance and can
    be changed in future releases
    Args:
        time_series: Vector or string of the sample data
        sample_length: Number of sequential points of the time series
        tolerance: Tolerance (default = 0.1...0.2 * std(time_series))
    Returns:
        Vector containing Sample Entropy (float)
    References:
        [1] http://en.wikipedia.org/wiki/Sample_Entropy
        [2] http://physionet.incor.usp.br/physiotools/sampen/
        [3] Madalena Costa, Ary Goldberger, CK Peng. Multiscale entropy analysis
            of biological signals
    """
    if tolerance is None:
        tolerance = 0.1 * np.std(time_series)

    n = len(time_series)
    prev = np.zeros(n)
    curr = np.zeros(n)
    A = np.zeros((sample_length, 1))  # number of matches for m = [1,...,template_length - 1]
    B = np.zeros((sample_length, 1))  # number of matches for m = [1,...,template_length]

    for i in range(n - 1):
        nj = n - i - 1
        ts1 = time_series[i]
        for jj in range(nj):
            j = jj + i + 1
            if abs(time_series[j] - ts1) < tolerance:  # distance between two vectors
                curr[jj] = prev[jj] + 1
                temp_ts_length = min(sample_length, curr[jj])
                for m in range(int(temp_ts_length)):
                    A[m] += 1
                    if j < n - 1:
                        B[m] += 1
            else:
                curr[jj] = 0
        for j in range(nj):
            prev[j] = curr[j]

    N = n * (n - 1) / 2
    B = np.vstack(([N], B[:sample_length - 1]))
    similarity_ratio = A / B
    se = - np.log(similarity_ratio)
    se = np.reshape(se, -1)
    return se

MATLAB实现:

function [e,A,B]=sampenc(y,M,r);
%function [e,A,B]=sampenc(y,M,r);
%
%Input
%
%y input data
%M maximum template length
%r matching tolerance
%
%Output
%
%e sample entropy estimates for m=0,1,...,M-1
%A number of matches for m=1,...,M
%B number of matches for m=0,...,M-1 excluding last point

n=length(y);
lastrun=zeros(1,n);
run=zeros(1,n);
A=zeros(M,1);
B=zeros(M,1);
p=zeros(M,1);
e=zeros(M,1);

for i=1:(n-1)
   nj=n-i;
   y1=y(i);
   for jj=1:nj
      j=jj+i;      
      if abs(y(j)-y1)<r
         run(jj)=lastrun(jj)+1;
         M1=min(M,run(jj));
         for m=1:M1           
            A(m)=A(m)+1;
            if j<n
               B(m)=B(m)+1;
            end            
         end
      else
         run(jj)=0;
      end      
   end
   for j=1:nj
      lastrun(j)=run(j);
   end
end
N=n*(n-1)/2;
B=[N;B(1:(M-1))];
p=A./B;
e=-log(p);

我还尝试了其他一些python实现,但它们都具有相同的缓慢结果: 矢量化的样本熵

I've also tried a few other python implementations and all of them have the same slow result: vectorized-sample-entropy

样品

sampen2.py

维基百科示例熵实现

我不认为计算机问题是因为它可以在MATLAB中快速运行相对论.

I don't think computer issue as it runs relativity fast in MATLAB.

据我所知,在实现方面,两组代码是相同的.我不知道为什么python实现这么慢.我会了解几秒钟的差异,但差异不会太大.让我知道您对这是为什么的想法或对如何改进python版本的建议.

As far as I can tell, implementation-wise both sets of code are the same. I have no idea why the python implementations are so slow. I would understand a difference of a few seconds but not such a large discrepancy. Let me know your thoughts on why this is or suggestions on how to improve the python versions.

顺便说一句:我使用的是numpy 1.14.5和MATLAB R2018a的Python 3.6.5.

BTW: I'm using Python 3.6.5 with numpy 1.14.5 and MATLAB R2018a.

推荐答案

正如评论中所述,Matlab默认情况下不使用jit编译器.在Python中,您可以使用Numba进行相同的操作.

As said in the comments, Matlab uses a jit-compiler by default Python doesn't. In Python you could use Numba to do quite the same.

您的代码稍作修改

import numba as nb
import numpy as np
import time

@nb.jit(fastmath=True,error_model='numpy')
def sample_entropy(time_series, sample_length, tolerance=None):
    """Calculate and return Sample Entropy of the given time series.
    Distance between two vectors defined as Euclidean distance and can
    be changed in future releases
    Args:
        time_series: Vector or string of the sample data
        sample_length: Number of sequential points of the time series
        tolerance: Tolerance (default = 0.1...0.2 * std(time_series))
    Returns:
        Vector containing Sample Entropy (float)
    References:
        [1] http://en.wikipedia.org/wiki/Sample_Entropy
        [2] http://physionet.incor.usp.br/physiotools/sampen/
        [3] Madalena Costa, Ary Goldberger, CK Peng. Multiscale entropy analysis
            of biological signals
    """
    if tolerance is None:
        tolerance = 0.1 * np.std(time_series)

    n = len(time_series)
    prev = np.zeros(n)
    curr = np.zeros(n)
    A = np.zeros((sample_length))  # number of matches for m = [1,...,template_length - 1]
    B = np.zeros((sample_length))  # number of matches for m = [1,...,template_length]

    for i in range(n - 1):
        nj = n - i - 1
        ts1 = time_series[i]
        for jj in range(nj):
            j = jj + i + 1
            if abs(time_series[j] - ts1) < tolerance:  # distance between two vectors
                curr[jj] = prev[jj] + 1
                temp_ts_length = min(sample_length, curr[jj])
                for m in range(int(temp_ts_length)):
                    A[m] += 1
                    if j < n - 1:
                        B[m] += 1
            else:
                curr[jj] = 0
        for j in range(nj):
            prev[j] = curr[j]

    N = n * (n - 1) // 2

    B2=np.empty(sample_length)
    B2[0]=N
    B2[1:]=B[:sample_length - 1]
    similarity_ratio = A / B2
    se = - np.log(similarity_ratio)
    return se

时间

a = np.random.rand(1, 95000)[0] #Python
a = rand(1, 95000) #Matlab
Python 3.6, Numba 0.40dev, Matlab 2016b, Core i5-3210M

Python:       487s
Python+Numba: 12.2s
Matlab:       71.1s

这篇关于Python与MATLAB在算法上的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆