Django(?)在进行一些python分析之后,其数据集很慢 [英] Django (?) really slow with large datasets after doing some python profiling

查看:133
本文介绍了Django(?)在进行一些python分析之后,其数据集很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在比较一个旧的PHP脚本和较新的Django版本以及PHP的PHP脚本,并且全部快速运行。更快到Django的一些错误。



首先,一些上下文:我有一个页面,吐出销售数据的报告。数据可以通过一些事情进行过滤,但大部分按日期过滤。这样做有点难以缓解,因为结果的可能性几乎是无止境的。有很多数字和计算完成,但是在PHP中处理这些并不是很大的问题。



更新:




  • 经过一些额外的测试,我认为没有任何可能导致放缓。如果我只是数字处理数据并吐出5行渲染的HTML,那不是那么慢(比PHP慢),但是如果我渲染了大量的数据,这很慢。


  • 每当我运行大型报告(例如一年中的所有销售)时,机器的CPU使用率将达到100%。不知道这是否意味深长。我使用的是mod_python和Apache。可能切换到WSGI可能有帮助?


  • 我的模板标签显示小计/总计流程从0.1秒到1秒,对于真正的大集合。我在报告中叫他们约6次,所以他们似乎不是最大的问题。




现在,我运行一个Python分析器,并返回这些结果:

 
订购者:内部时间
列表从3074减少到20,因为限制

ncalls tottime percall cumtime percall文件名:lineno(功能)
2939417 26.290 0.000 44.857 0.000 /usr/lib/python2.5/tokenize.py:212(generate_tokens)
2822655 17.049 0.000 17.049 0.000 {内置方法匹配}
1689928 15.418 0.000 23.297 0.000 /usr/lib/python2.5/decimal.py:515(__new__)
12289605 11.464 0.000 11.464 0.000 {isinstance}
882618 9.614 0.000 25.518 0.000 /usr/lib/python2.5/decimal.py:1447(_fix)
17393 8.742 0.001 60.798 0.003 /usr/lib/python2.5/tokenize.py:158( tokenize_loop)
11 7.886 0.717 7.886 0.717 {method'accept'of'_socket.socket 'objects}
365577 7.854 0.000 30.233 0.000 /usr/lib/python2.5/decimal.py:954(__add__)
2922024 7.199 0.000 7.199 0.000 /usr/lib/python2.5/inspect.py :571(tokeneater)
438750 5.868 0.000 31.033 0.000 /usr/lib/python2.5/decimal.py:1064(__mul__)
60799 5.666 0.000 9.377 0.000 /usr/lib/python2.5/site -packages / django / db / models / base.py:241(__ init__)
17393 4.734 0.000 4.734 0.000 {方法'查询'_mysql.connection'对象}
1124348 4.631 0.000 8.469 0.000 / usr /lib/python2.5/site-packages/django/utils/encoding.py:44(force_unicode)
219076 4.139 0.000 156.618 0.001 /usr/lib/python2.5/site-packages/django/template/__init__ .py:700(_resolve_lookup)
1074478 3.690 0.000 11.096 0.000 /usr/lib/python2.5/decimal.py:5065(_convert_other)
2973281 3.424 0.000 3.424 0.000 /usr/lib/python2.5 /decimal.py:718(__nonzero__)
759 014 2.962 0.000 3.371 0.000 /usr/lib/python2.5/decimal.py:4675(__init__)
381756 2.806 0.000 128.447 0.000 /usr/lib/python2.5/site-packages/django/db/models/ fields / related.py:231(__ get__)
842130 2.764 0.000 3.557 0.000 /usr/lib/python2.5/decimal.py:3339(_dec_from_triple)

tokenize.py出现在顶部,这可以使我有一些意义,因为我正在做很多数字格式化。 Decimal.py是有意义的,因为报告基本上是90%的数字。我不知道内置方法匹配是什么,因为我没有在自己的代码中执行任何正则表达式或类似的东西(Django正在做什么)最近的事情是我我使用itertools ifilter。



似乎这些是主要的罪魁祸首,如果我能弄清楚如何减少处理时间,那么我会有一个更快的页面有没有人有任何建议,我可以如何开始减少这一点?



我真的不知道如何解决这个tokenize / decimal问题,而不用简单地删除它们。



更新:我在大多数的数据和结果时间几乎相同,后者是一个更快的一点,但不是太多的原因的问题。 tokenize.py正在发生什么?

解决方案

有很多事情可以假设你的问题,这里是我的假设:你正在使用Django的内置ORM工具和模型(即sales-data = modelobj.objects()), .all()),在PHP方面你正在处理直接的SQL查询并使用query_set。



Django正在做很多类型的转换和转换为数据类型从数据库查询到ORM / Model对象和关联的管理器(对象()默认)。



在PHP中,您正在控制转换,并确切知道如何从一个数据类型转换到另一个数据类型,您正在基于该问题单独保存一些执行时间。



我建议您尝试将一些奇怪的数字工作移动到数据库中,特别是如果你正在做基于记录集的处理 - 数据库从那里处理那种处理早餐。在Django中,您可以将RAW SQL发送到数据库: http://docs.djangoproject.com/en/dev/topics/db/sql/#topics-db-sql



我希望这至少可以让你指向正确的方向...


I was comparing an old PHP script of mine versus the newer, fancier Django version and the PHP one, with full spitting out of HTML and all was functioning faster. MUCH faster to the point that something has to be wrong on the Django one.

First, some context: I have a page that spits out reports of sales data. The data can be filtered by a number of things but is mostly filtered by date. This makes it a bit hard to cache it as the possibilities for results is nearly endless. There are a lot of numbers and calculations done but it was never much of a problem to handle within PHP.

UPDATES:

  • After some additional testing there is nothing within my view that is causing the slowdown. If I am simply number-crunching the data and spitting out 5 rows of rendered HTML, it's not that slow (still slower than PHP), but if I am rendering a lot of data, it's VERY slow.

  • Whenever I ran a large report (e.g. all sales for the year), the CPU usage of the machine goes to 100%. Don't know if this means much. I am using mod_python and Apache. Perhaps switching to WSGI may help?

  • My template tags that show the subtotals/totals process anywhere from 0.1 seconds to 1 second for really large sets. I call them about 6 times within the report so they don't seem like the biggest issue.

Now, I ran a Python profiler and came back with these results:

Ordered by: internal time
   List reduced from 3074 to 20 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2939417   26.290    0.000   44.857    0.000 /usr/lib/python2.5/tokenize.py:212(generate_tokens)
  2822655   17.049    0.000   17.049    0.000 {built-in method match}
  1689928   15.418    0.000   23.297    0.000 /usr/lib/python2.5/decimal.py:515(__new__)
 12289605   11.464    0.000   11.464    0.000 {isinstance}
   882618    9.614    0.000   25.518    0.000 /usr/lib/python2.5/decimal.py:1447(_fix)
    17393    8.742    0.001   60.798    0.003 /usr/lib/python2.5/tokenize.py:158(tokenize_loop)
       11    7.886    0.717    7.886    0.717 {method 'accept' of '_socket.socket' objects}
   365577    7.854    0.000   30.233    0.000 /usr/lib/python2.5/decimal.py:954(__add__)
  2922024    7.199    0.000    7.199    0.000 /usr/lib/python2.5/inspect.py:571(tokeneater)
   438750    5.868    0.000   31.033    0.000 /usr/lib/python2.5/decimal.py:1064(__mul__)
    60799    5.666    0.000    9.377    0.000 /usr/lib/python2.5/site-packages/django/db/models/base.py:241(__init__)
    17393    4.734    0.000    4.734    0.000 {method 'query' of '_mysql.connection' objects}
  1124348    4.631    0.000    8.469    0.000 /usr/lib/python2.5/site-packages/django/utils/encoding.py:44(force_unicode)
   219076    4.139    0.000  156.618    0.001 /usr/lib/python2.5/site-packages/django/template/__init__.py:700(_resolve_lookup)
  1074478    3.690    0.000   11.096    0.000 /usr/lib/python2.5/decimal.py:5065(_convert_other)
  2973281    3.424    0.000    3.424    0.000 /usr/lib/python2.5/decimal.py:718(__nonzero__)
   759014    2.962    0.000    3.371    0.000 /usr/lib/python2.5/decimal.py:4675(__init__)
   381756    2.806    0.000  128.447    0.000 /usr/lib/python2.5/site-packages/django/db/models/fields/related.py:231(__get__)
   842130    2.764    0.000    3.557    0.000 /usr/lib/python2.5/decimal.py:3339(_dec_from_triple)

tokenize.py comes out on top, which can make some sense as I am doing a lot of number formatting. Decimal.py makes sense since the report is essentially 90% numbers. I have no clue what the built-in method match is as I am not doing any Regex or similar in my own code (Something Django is doing?) The closest thing is I am using itertools ifilter.

It seems those are the main culprits and if I could figure out how to reduce the processing time of those then I would have a much much faster page.

Does anyone have any suggestions on how I could start on reducing this? I don't really know how I would fix this the tokenize/decimal issues without simply removing them.

Update: I ran some tests with/without filters on most of the data and the result times pretty much came back the same, the latter being a bit faster but not much to be the cause of the issue. What is exactly going on in tokenize.py?

解决方案

There is a lot of things to assume about your problem as you don't have any type of code sample.

Here are my assumptions: You are using Django's built-in ORM tools and models (i.e. sales-data = modelobj.objects().all() ) and on the PHP side you are dealing with direct SQL queries and working with a query_set.

Django is doing a lot of type converting and casting to datatypes going from a database query into the ORM/Model object and the associated manager (objects() by default).

In PHP you are controlling the conversions and know exactly how to cast from one data type to another, you are saving some execution time based on that issue alone.

I would recommend trying to move some of that fancy number work into the database, especially if you are doing record-set based processing - databases eat that kind of processing from breakfast. In Django you can send RAW SQL over to the database: http://docs.djangoproject.com/en/dev/topics/db/sql/#topics-db-sql

I hope this at least can get you pointed in the right direction...

这篇关于Django(?)在进行一些python分析之后,其数据集很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆