如何分析 SQLAlchemy 驱动的应用程序? [英] How can I profile a SQLAlchemy powered application?

查看:71
本文介绍了如何分析 SQLAlchemy 驱动的应用程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人有分析 Python/SQLAlchemy 应用程序的经验吗?找出瓶颈和设计缺陷的最佳方法是什么?

Does anyone have experience profiling a Python/SQLAlchemy app? And what are the best way to find bottlenecks and design flaws?

我们有一个 Python 应用程序,其中数据库层由 SQLAlchemy 处理.该应用程序使用批处理设计,因此许多数据库请求是在有限的时间跨度内按顺序完成的.目前运行时间有点长,因此需要进行一些优化.我们不使用 ORM 功能,数据库是 PostgreSQL.

We have a Python application where the database layer is handled by SQLAlchemy. The application uses a batch design, so a lot of database requests is done sequentially and in a limited timespan. It currently takes a bit too long to run, so some optimization is needed. We don't use the ORM functionality, and the database is PostgreSQL.

推荐答案

有时只是简单的 SQL 日志记录(通过 python 的日志记录模块或通过 create_engine()echo=True 参数启用/code>) 可以让您了解事情需要多长时间.例如,如果您在 SQL 操作后立即记录某些内容,您会在日志中看到如下内容:

Sometimes just plain SQL logging (enabled via python's logging module or via the echo=True argument on create_engine()) can give you an idea how long things are taking. For example if you log something right after a SQL operation, you'd see something like this in your log:

17:37:48,325 INFO  [sqlalchemy.engine.base.Engine.0x...048c] SELECT ...
17:37:48,326 INFO  [sqlalchemy.engine.base.Engine.0x...048c] {<params>}
17:37:48,660 DEBUG [myapp.somemessage] 

如果您在操作后立即登录 myapp.somemessage,您就会知道完成 SQL 部分需要 334 毫秒.

if you logged myapp.somemessage right after the operation, you know it took 334ms to complete the SQL part of things.

记录 SQL 还将说明是否正在发出数十/数百个查询,这些查询可以通过联接更好地组织成更少的查询.当使用 SQLAlchemy ORM 时,eager loading"功能被提供给部分 (contains_eager()) 或完全 (eagerload(), eagerload_all()code>) 自动化此活动,但如果没有 ORM,它只是意味着使用连接,以便跨多个表的结果可以加载到一个结果集中,而不是随着更多深度的添加而乘以查询数量(即 r + r*r2 + r*r2*r3 ...)

Logging SQL will also illustrate if dozens/hundreds of queries are being issued which could be better organized into much fewer queries via joins. When using the SQLAlchemy ORM, the "eager loading" feature is provided to partially (contains_eager()) or fully (eagerload(), eagerload_all()) automate this activity, but without the ORM it just means to use joins so that results across multiple tables can be loaded in one result set instead of multiplying numbers of queries as more depth is added (i.e. r + r*r2 + r*r2*r3 ...)

如果日志显示单个查询花费的时间太长,则您需要细分在数据库中处理查询、通过网络发送结果、由 DBAPI 处理以及最终由 DBAPI 接收所花费的时间SQLAlchemy 的结果集和/或 ORM 层.根据具体情况,这些阶段中的每一个都可能呈现出各自的瓶颈.

If logging reveals that individual queries are taking too long, you'd need a breakdown of how much time was spent within the database processing the query, sending results over the network, being handled by the DBAPI, and finally being received by SQLAlchemy's result set and/or ORM layer. Each of these stages can present their own individual bottlenecks, depending on specifics.

为此,您需要使用分析,例如 cProfile 或 hotshot.这是我使用的装饰器:

For that you need to use profiling, such as cProfile or hotshot. Here is a decorator I use:

import cProfile as profiler
import gc, pstats, time

def profile(fn):
    def wrapper(*args, **kw):
        elapsed, stat_loader, result = _profile("foo.txt", fn, *args, **kw)
        stats = stat_loader()
        stats.sort_stats('cumulative')
        stats.print_stats()
        # uncomment this to see who's calling what
        # stats.print_callers()
        return result
    return wrapper

def _profile(filename, fn, *args, **kw):
    load_stats = lambda: pstats.Stats(filename)
    gc.collect()

    began = time.time()
    profiler.runctx('result = fn(*args, **kw)', globals(), locals(),
                    filename=filename)
    ended = time.time()

    return ended - began, load_stats, locals()['result']

要分析一段代码,请将其放置在带有装饰器的函数中:

To profile a section of code, place it in a function with the decorator:

@profile
def go():
    return Session.query(FooClass).filter(FooClass.somevalue==8).all()
myfoos = go()

分析的输出可用于了解时间花在哪里.例如,如果您看到所有时间都花费在 cursor.execute() 中,那么这是对数据库的低级 DBAPI 调用,这意味着您的查询应该通过添加索引或重构查询和/或基础架构.对于该任务,我建议使用 pgadmin 及其图形 EXPLAIN 实用程序来查看查询正在执行的工作类型.

The output of profiling can be used to give an idea where time is being spent. If for example you see all the time being spent within cursor.execute(), that's the low level DBAPI call to the database, and it means your query should be optimized, either by adding indexes or restructuring the query and/or underlying schema. For that task I would recommend using pgadmin along with its graphical EXPLAIN utility to see what kind of work the query is doing.

如果您看到数以千计的与获取行相关的调用,这可能意味着您的查询返回的行比预期的多 - 由于不完整的连接而导致的笛卡尔积可能会导致此问题.另一个问题是在类型处理中花费的时间 - 诸如 Unicode 之类的 SQLAlchemy 类型将对绑定参数和结果列执行字符串编码/解码,这可能并非在所有情况下都需要.

If you see many thousands of calls related to fetching rows, it may mean your query is returning more rows than expected - a cartesian product as a result of an incomplete join can cause this issue. Yet another issue is time spent within type handling - a SQLAlchemy type such as Unicode will perform string encoding/decoding on bind parameters and result columns, which may not be needed in all cases.

配置文件的输出可能有点令人生畏,但经过一些练习后,它们很容易阅读.曾经有人在邮件列表上声称速度慢,在让他发布配置文件的结果后,我能够证明速度问题是由于网络延迟 - 在 cursor.execute() 以及所有 Python 中花费的时间方法非常快,而大部分时间都花在了 socket.receive() 上.

The output of a profile can be a little daunting but after some practice they are very easy to read. There was once someone on the mailing list claiming slowness, and after having him post the results of profile, I was able to demonstrate that the speed problems were due to network latency - the time spent within cursor.execute() as well as all Python methods was very fast, whereas the majority of time was spent on socket.receive().

如果您有雄心壮志,那么在 SQLAlchemy 单元测试中还有一个更复杂的 SQLAlchemy 分析示例,如果您四处看看 http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/aaa_profiling.在那里,我们使用装饰器进行了测试,这些装饰器断言用于特定操作的最大方法调用数,以便如果检查到一些低效的东西,测试将揭示它(重要的是要注意,在 Python 中,函数调用具有最高的任何操作的开销,并且调用次数通常与花费的时间几乎成正比).值得注意的是zoomark"测试,它使用了一种奇特的SQL 捕获"方案,该方案从等式中消除了 DBAPI 的开销——尽管该技术对于花园品种分析并不是真正必要的.

If you're feeling ambitious, there's also a more involved example of SQLAlchemy profiling within the SQLAlchemy unit tests, if you poke around http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/aaa_profiling . There, we have tests using decorators that assert a maximum number of method calls being used for particular operations, so that if something inefficient gets checked in, the tests will reveal it (it is important to note that in Python, function calls have the highest overhead of any operation, and the count of calls is more often than not nearly proportional to time spent). Of note are the the "zoomark" tests which use a fancy "SQL capturing" scheme which cuts out the overhead of the DBAPI from the equation - although that technique isn't really necessary for garden-variety profiling.

这篇关于如何分析 SQLAlchemy 驱动的应用程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆