如何替换循环和嵌套句子以加速Python代码? [英] What to replace loops and nested if sentences with in order to speed up Python code?

查看:197
本文介绍了如何替换循环和嵌套句子以加速Python代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能避免for循环和嵌套if句和更多Pythonic?

How can I avoid for loops and nested if sentences and be more Pythonic?

乍一看,这似乎是请完成我的所有工作我问题。我可以向你保证,事实并非如此。我正在尝试学习一些真正的Python,并希望找到基于可重现的示例和预定义函数来加速代码的方法。

At first glance this may seem like a "please do my all of my work for me" question. I can assure you that it is not. I'm trying to learn some real Python, and would like to discover ways of speeding up code based on a reproducible example and a pre-defined function.

我是使用for循环和嵌套句子来计算金融市场中某些信号的回报。我已经做了几次尝试,但我只是无法使用矢量化或理解或其他更多的pythonic工具。到目前为止我一直都很好,但最后我开始感觉到使用功能太大的功能很痛苦。

I'm calculating returns from following certain signals in financial markets using loads of for loops and nested if sentences. I have made several attempts, but I am just getting nowhere with vectorizing or comprehensions or other more pythonic tools of the trade. I've been OK with that so far, but finally I'm starting to feel the pain of using functions that are simply too slow at scale.

我有一个数据帧有两个索引和一个特定事件。包含两个第一个代码片段以逐步显示该过程。我已经在最后添加了一些预定义的设置和一个功能。

I have a dataframe with two indexes and one particular event. The two first code snippets are included to show the procedure step by step. I've included the complete thing with some predefined settings and a function at the very end.

在[1]

# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)

Observations = 10

# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                  columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))

# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])    

# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0

print(df)

Out [1]

数据框按日期编制索引。我正在寻找的信号取决于这些指标和事件的相互作用。信号按以下方式计算(扩展上面的代码段):

The data frame is indexed by dates. The signal I'm looking for is determined by the interaction of these indicators and events. The Signal is calculated the following way (expanding on the snippet above):

在[2]

i = 0
for signals in df['Signal']:
    if i == 0: 
        # First signal is always zero
        df.ix[i,'Signal'] = 0
    else:
        # Signal is 1 if Indicator A is above a certain level
        if df.ix[i,'IndicatorA'] > 5:                
            df.ix[i,'Signal'] = 1
        else:
            # Signal is 1 if Indicator B is above a certain level
            # AND a certain event occurs                
            if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                 df.ix[i,'Signal'] = 1
            else:
                df.ix[i,'Signal'] = 0          
    i = i + 1    

print(df['Signal'])

Out [2 ]

以下是整个事物被定义为一个功能。请注意,该函数返回Signal的平均值而不是Signal列本身。这样在运行代码时控制台就不会混乱,我们可以在ipython中使用%time来测试代码的效率。

Below is the whole thing defined as a function. Notice that the function returns the average of the Signal instead of the Signal column itself. This way the console is not cluttered when the code is run, and we can test the efficency of the code using %time in ipython.

# Settings
import numpy as np
import pandas as pd
import datetime

# The whole thing defined as a function

def fxSlow(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    i = 0
    for signals in df['Signal']:
        if i == 0: 
            # First signal is always zero
            df.ix[i,'Signal'] = 0
        else:
            # Signal is 1 if Indocator A is above a certain level
            if df.ix[i,'IndicatorA'] > 5:                
                df.ix[i,'Signal'] = 1
            else:
                # Signal is 1 if Indicator B is above a certain level
                # AND a certain event occurs                
                if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                     df.ix[i,'Signal'] = 1
                else:
                    df.ix[i,'Signal'] = 0          
        i = i + 1    


    return np.mean(df['Signal'])

下面您可以看到运行具有不同观察/大小数据框的函数的结果:

Below you can see the results of running the function with different observations / size of the data frame:

那么,我怎样才能通过更多Pythonic来加快速度?

So, how can I speed things up by being more Pythonic?

作为奖励问题,是什么导致将观察次数增加到100000时出错?

And as a bonus question, what causes the error when I increase the number of observations to 100000?

推荐答案

你能尝试这样的事吗?

def fxSlow2(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    df['Signal'] = (np.where(df.IndicatorA > 5, 
          1, 
          np.where( (df.shift(-1).IndicatorB > 5) &(df.Event > 1), 
                    1, 
                    0)
          )
    )

    df.loc[df.index[0],'Signal'] = 0

    return np.mean(df['Signal'])

%time fxSlow2(100)

%time fxSlow2(100)

挂壁时间:10毫秒

输出[208]:0.43

Out[208]: 0.43

%时间fxSlow2(1000)

%time fxSlow2(1000)

上市时间:15毫秒

输出[209]:0.414

Out[209]: 0.414

%time fxSlow2(10000)

%time fxSlow2(10000)

壁挂时间:61 ms

Wall time: 61 ms

Out [210]:0.4058

Out[210]: 0.4058

这篇关于如何替换循环和嵌套句子以加速Python代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆