Python 分析、导入(特别是 __init__)似乎是最耗时的 [英] Python profiling, imports (and specially __init__) is what seems to take the most time

查看:47
本文介绍了Python 分析、导入(特别是 __init__)似乎是最耗时的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个似乎运行缓慢的脚本,我使用 cProfile(和可视化工具

我对 python 中的导入序列的工作方式不是很熟悉,所以也许我有些困惑......我还在我定制的每个人中放置了 _ _ init _ _.py 文件包,不确定这是否是我应该做的.

无论如何,如果有人有任何提示,非常感谢!

<小时>

功能按自身排序时的附加图片:

<小时>

编辑 2:

这里附上代码,为了让回答者更清楚:

from strategy.strategies.gradient_stop_and_target import make_one_trade从日期时间导入时间增量,日期时间将熊猫导入为 pd从 data.db 导入 get_df、mongo_read_only、save_one、mongo_read_write、save_manyfrom data.get import get_symbols从 strategy.trades 导入 make_trade、make_mae、get_prices、get_signals、\get_prices_subset#from profilehooks 导入配置文件mongo = mongo_read_only()美元止损 = 200美元目标 = 400period_change = 3信号 = get_df(mongo.signals.signals, strategy = {'$regex' : '^indicators_group'}).iloc[0]符号 = get_symbols(mongo, description = signal['symbol'])[0]价格 = get_prices(信号['日期时间'],信号['日期时间'].replace(小时 = 23, 分钟 = 59),象征,蒙戈)make_one_trade(信号,价格,象征,美元停止,美元目标,period_change)

函数 get_prices 只是从 mongo db 数据库中获取数据,而 make_one_trade 使用 Pandas 进行简单的计算.这在我的项目中的其他任何地方都不会出现问题.

<小时>

编辑 3:

这是我在查看"选项卡中选择检测周期"选项时的 Kcache 研磨屏幕:

这是否真的意味着我自己创建的包中确实存在循环导入需要花费所有时间才能解决?

解决方案

没有.您将 累积时间 与花在 __init__.py 文件本身的顶级代码中的时间混为一谈.顶层代码调用其他方法,将它们放在一起需要很多时间.

查看 self 列,找出所有时间都花在了哪里.另请参阅 什么是用cProfile分析的python脚本中tottime和cumtime的区别?incl.列是累计时间,self是总时间.>

我只是过滤掉所有 <frozen importlib.*> 条目;Python 项目已经确保优化这些路径.

但是,您的第二个屏幕截图确实显示在您的分析运行中,您的 Python 代码忙于加载要导入的模块的字节码(marshal 模块 提供了 Python 字节码序列化实现).Python 程序除了导入模块之外什么都不做并且没有完成其他工作,或者它正在使用某种形式的动态导入来加载大量模块或以其他方式忽略正常的模块缓存并重复重新加载相同的模块.

您可以使用 Python 3.7 的新来分析导入时间X importtime 命令行开关,或者您可以使用专用的 import-profiler 找出为什么导入需要这么长时间.

I have a script that seemed to run slow and that i profiled using cProfile (and visualisation tool KCacheGrind)

It seems that what is taking almost 90% of the runtime is the import sequence, and especially the running of the _ _ init _ _.py files...

Here a screenshot of the KCacheGrind output (sorry for attaching an image...)

I am not very familiar with how the import sequence works in python ,so maybe i got something confused... I also placed _ _ init _ _.py files in everyone of my custom made packages, not sure if that was what i should have done.

Anyway, if anyone has any hint, greatly appreciated!


EDIT: additional picture when function are sorted by self:


EDIT2:

here the code attached, for more clarity for the answerers:

from strategy.strategies.gradient_stop_and_target import make_one_trade

from datetime import timedelta, datetime
import pandas as pd
from data.db import get_df, mongo_read_only, save_one, mongo_read_write, save_many
from data.get import get_symbols

from strategy.trades import make_trade, make_mae, get_prices, get_signals, \
    get_prices_subset
#from profilehooks import profile


mongo = mongo_read_only()


dollar_stop = 200
dollar_target = 400
period_change = 3


signal = get_df(mongo.signals.signals, strategy = {'$regex' : '^indicators_group'}).iloc[0]


symbol = get_symbols(mongo, description = signal['symbol'])[0]


prices = get_prices(
    signal['datetime'], 
    signal['datetime'].replace(hour = 23, minute = 59),
    symbol,
    mongo)


make_one_trade(
    signal, 
    prices, 
    symbol,             
    dollar_stop,
    dollar_target,
    period_change)

The function get_prices simply get data from a mongo db database, and make_one_trade does simple calculation with pandas. This never poses problem anywhere else in my project.


EDIT3:

Here the Kcache grind screen when i select 'detect cycle' option in View tab:

Could that actually mean that there are indeed circular imports in my self created packages that takes all that time to resolve?

解决方案

No. You are conflating cumulative time with time spent in the top-level code of the __init__.py file itself. The top-level code calls other methods, and those together take a lot of time.

Look at the self column instead to find where all that time is being spent. Also see What is the difference between tottime and cumtime in a python script profiled with cProfile?, the incl. column is the cumulative time, self is the total time.

I'd just filter out all the <frozen importlib.*> entries; the Python project has already made sure those paths are optimised.

However, your second screenshot does show that in your profiling run, all that your Python code busied itself with was loading bytecode for modules to import (the marshal module provides the Python bytecode serialisation implementation). Either the Python program did nothing but import modules and no other work was done, or it is using some form of dynamic import that is loading a large number of modules or is otherwise ignoring the normal module caches and reloading the same module(s) repeatedly.

You can profile import times using Python 3.7's new -X importtime command-line switch, or you could use a dedicated import-profiler to find out why imports take such a long time.

这篇关于Python 分析、导入(特别是 __init__)似乎是最耗时的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆