坏Django / uwsgi表演 [英] Bad Django / uwsgi performance

查看:124
本文介绍了坏Django / uwsgi表演的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个带有nginx和django的django应用程序uwsgi。这是我如何运行uwsgi:

  sudo uwsgi -b 25000 --chdir = / www / python / apps / pyapp --module = wsgi:application --env DJANGO_SETTINGS_MODULE = settings --socket = / tmp / pyapp.socket --cheaper = 8 --processes = 16 --harakiri = 10 --max-requests = 5000 --vacuum --master  - pidfile = / tmp / pyapp-master.pid --uid = 220 --gid = 499 

& ; nginx配置:

  server {
listen 80;
server_name test.com

root / www / python / apps / pyapp /;

access_log /var/log/nginx/test.com.access.log;
error_log /var/log/nginx/test.com.error.log;

#https://docs.djangoproject.com/en/dev/howto/static-files/#serving-static-files-in-production
location / static / {
别名/ www / python / apps / pyapp / static /;
到期30d;
}

location / media / {
alias / www / python / apps / pyapp / media /;
到期30d;
}

位置/ {
uwsgi_pass unix:///tmp/pyapp.socket;
包括uwsgi_params;
proxy_read_timeout 120;
}

#如果上游不可用或崩溃,什么服务
#error_page 500 502 503 504 /media/50x.html;
}

出现问题。在服务器上执行ab(ApacheBenchmark)时,我得到以下结果:



nginx版本:nginx版本:nginx / 1.2.6



uwsgi版本:1.4.5

 服务器软件:nginx / 1.0.15 
服务器主机名:pycms.com
服务器端口:80

文档路径:/ api / nodes / mostviewed / 8 /?format = json
文档长度:8696字节

并发级别:100
测试时间:41.232秒
完成请求:1000
失败请求:0
写入错误:0
总共转移:8866000字节
传输的HTML:8696000字节
每秒请求数:24.25 [#/ sec](平均值)
每个请求的时间:4123.216 [ms](平均)
每个时间请求:41.232 [ms](意味着所有并发请求)
传输速率:209.99 [Kbytes / sec]收到

在500个并发级别运行时



  oncurrency等级:500 
测试所需时间:2.175秒
完成请求:1000
失败请求:50
(连接:0,接收:0,长度:50,例外:0)
写错误:0
非2xx响应:950
总转移:629200字节
HTML传输:476300字节
每秒请求数:459.81 [#/ sec](平均值)
每个请求的时间:1087.416 [ms](平均)
每个请求的时间: 2.175 [ms](意味着所有并发请求)
传输速率:282.53 [Kbytes / sec]收到

正如您所看到的那样,服务器上的所有请求均失败,超时错误或客户端过早断开连接或:

  writev():GET / api / nodes / mostviewed / 9 /?format = json 


这里有一些关于我的应用程序:
基本上,它是一个集合的反映包含所有内容的MySQL表的模型。在前端,我有django-rest框架,为客户端提供json内容。



我已经安装了django-profiling& django调试工具栏,看看发生了什么。在django-profiling这里是我运行一个请求时得到的:

 实例宽RAM使用率

一组147315个对象的分区。总大小= 20779408字节。
指数计数%大小%累计%种类(类/类别)
0 63960 43 5726288 28 5726288 28 str
1 36887 25 3131112 15 8857400 43元组
2 2495 2 1500392 7 10357792 50 dict(no owner)
3 615 0 1397160 7 11754952 57 dict of module
4 1371 1 1236432 6 12991384 63 type
5 9974 7 1196880 6 14188264 68函数
6 8974 6 1076880 5 15265144 73类型.CodeType
7 1371 1 1014408 5 16279552 78类型
8 2684 2 340640 2 16620192 80列表
9 382 0 328912 2 16949104 82 dict类
< 607行。类型例如'_.more'来查看。>



CPU请求的时间

11068秒钟内的11068个函数调用(10158个原始调用)

有序通过:累积时间

ncalls tottime percall cumtime percall文件名:lineno(功能)
1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/django/views/generic /base.py:44(view)
1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/django/views/decorators/csrf.py:76(wrapped_view)
1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/rest_framework/views.py:359(dispatch)
1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/rest_framework /generics.py:144 (get)
1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/rest_framework/mixins.py:46(list)
1 0.000 0.000 0.038 0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:348(data)
21/1 0.000 0.000 0.038 0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:273(to_native)
21/1 0.000 0.000 0.038 0.038 / usr / lib / python2 .6 / site-packages / rest_framework / serializers.py:190(convert_object)
11/1 0.000 0.000 0.036 0.036 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:303(field_to_native )
13/11 0.000 0.000 0.033 0.003 /usr/lib/python2.6/site-packages/django/db/models/query.py:92(__iter__)
3/1 0.000 0.000 0.033 0.033 /usr/lib/python2.6/site-packages/django/db/models/query.py:77(__len__)
4 0.000 0.000 0.030 0.008 /usr/lib/python2.6/site-packages/django /db/models/sql/compiler.py:794(execute_sql)
1 0.000 0.000 0.021 0.021 /usr/lib/python2.6/site-packages/django/views/generic/list.py:33(paginate_queryset )
1 0.000 0.000 0.021 0.021 /usr/lib/python2.6/site-packages/djang o / core / paginator.py:35(页)
1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/core/paginator.py:20(validate_number)
3 0.000 0.000 0.020 0.007 /usr/lib/python2.6/site-packages/django/core/paginator.py:57(_get_num_pages)
4 0.000 0.000 0.020 0.005 /usr/lib/python2.6/site- packages / django / core / paginator.py:44(_get_count)
1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:340(count)
1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:394(get_count)
1 0.000 0.000 0.020 0.020 / usr / lib / python2.6 / site-packages / django / db / models / query.py:568(_prefetch_related_objects)
1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/ models / query.py:1596(prefetch_related_objects)
4 0.000 0.000 0.020 0.005 / usr / li b / python2.6 / site-packages / django / db / backends / util.py:36(execute)
1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/ models / sql / query.py:340(get_aggregation)
5 0.000 0.000 0.020 0.004 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:136(execute)
2 0.000 0.000 0.020 0.010 /usr/lib/python2.6/site-packages/django/db/models/query.py:1748(prefetch_one_level)
4 0.000 0.000 0.020 0.005 /usr/lib/python2.6/site- package / django / db / backends / mysql / base.py:112(execute)
5 0.000 0.000 0.019 0.004 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:316(_query)
60 0.000 0.000 0.018 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:231(iterator)
5 0.012 0.002 0.015 0.003 / usr / lib64 / python2.6 / site-packages / MySQLdb / cursors.py:278(_do_query)
60 0.000 0.000 0.013 0.000 / usr / lib / python2.6 / site-packages / django / db / models / sql / compiler.py:751(results_iter)
30 0.000 0.000 0.010 0.000 /usr/lib/python2.6/site-packages/django/ db / models / manager.py:115(全部)
50 0.000 0.000 0.009 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:870(_clone)
51 0.001 0.000 0.009 0.000 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:235(clone)
4 0.000 0.000 0.009 0.002 / usr / lib / python2.6 / site-packages / django / db / backends / __ init__.py:302 (cursor)
4 0.000 0.000 0.008 0.002 /usr/lib/python2.6/site-packages/django/db/backends/ mysql / base.py:361(_cursor)
1 0.000 0.000 0.008 0.008 /usr/lib64/python2.6/site-packages/MySQLdb/__init__.py:78(Connect)
910/208 0.003 0.000 0.008 0.000 /usr/lib64/python2.6/copy.py:144(deepcopy)
22 0.000 0.000 0.007 0.000 /usr/lib/python2.6/site-packages/ django / db / models / query.py:619(filter)
22 0.000 0.000 0.007 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:633(_filter_or_exclude)
20 0.000 0.000 0.005 0.000 /usr/lib/python2.6/site-packages/django/db/models/fields/related.py:560(get_query_set)
1 0.000 0.000 0.005 0.005 / usr / lib64 / python2.6 / site-packages / MySQLdb / connections.py:8()

等等



但是,django-debug-toolbar显示以下内容:

 资源使用
资源值
用户CPU时间149.977毫秒
系统CPU时间119.982毫秒
总CPU时间269.959毫秒
已用时间326.291毫秒
上下文切换11个自愿,40个非自愿的

和5个查询在27.1 ms

顶部显示了我在本地服务器和远程mac上运行的负载平均上升速度和apache基准网络内的哼声显示我没有提供很多请求/秒。
有什么问题?这是我在分析代码时可以达到的目的,所以如果有人可以指出我在这里做什么,那将是值得赞赏的。



编辑(23 / 02/2013):根据Andrew Alcock的答案添加更多细节:
需要注意/回答的要点是
(3)(3)我已经执行了显示全局变量在MySQL上,发现MySQL配置有151个max_connections设置,这足以为我开始使用uwsgi的工作人员服务。



(3)(4)( 2)我正在剖析的单一请求是最重的一个。它根据django-debug-toolbar执行4个查询。发生什么事情是,所有查询都运行在:
3.71,2.83,0.88,4.84 ms。



(4)这里指的是内存分页?如果是,我该怎么说? (5)在16名工作人员中,100名并发率,1000份请求负载平均值高达〜12
我对不同数量的工作人员进行了测试(并发级别为100):


  1. 1个工作人员,负载平均值约1.85,19个要求/秒,每个请求时间:5229.520, 2xx

  2. 2个工作人员,负载平均值〜1.5,19个要求/秒,每个请求的时间:516.520,0个非2xx

  3. 平均〜3,16个要求/秒,每个请求的时间:5929.921,0个非2xx

  4. 8个工作人员,负载平均值〜5,18个要求/秒,每个请求的时间:5301.458,0非2xx

  5. 16个工作人员,平均负载约为19,15个要求/秒,每个请求的时间:6384.720,0个非2xx

如你所见,我们拥有的工人越多,我们在系统上的负载就越多。我可以在uwsgi的守护程序日志中看到,当我增加工作人员数时,响应时间以毫秒为单位增加。



在16位工作人员中,运行500个并发级别请求uwsgi开始登录错误:

  writev():Broken pipe [proto / uwsgi.c line 124] 

加载也可以达到〜10。测试不需要太多时间,因为非2xx响应是1000的响应是923,这就是为什么这里的响应相当快,因为​​它几乎是空的。这也是对你的点#4的回复。



假设我在这里面临的是基于I / O和网络的操作系统延迟,什么是推荐的动作来扩大这个?新硬件?更大的服务器?



谢谢

解决方案

/ strong>看到你有1个虚拟核心的评论,通过所有关联点添加评论



编辑2 更多信息从Maverick,所以我正在消除被排除的想法和发展确认的问题。



编辑3 填写了有关uwsgi请求队列和缩放选项的更多详细信息。改进的语法。



编辑4 从小牛的更新和微小的改进



评论是太小,所以这里有一些想法:


  1. 加载平均基本上是运行或等待CPU注意的进程数。对于具有1个CPU内核的完美加载系统,负载平均值应为1.0;对于4核心系统,应该是4.0。当您运行网络测试,螺纹火箭和您有一个等待CPU的进程的批次。除非负载平均值大大超过CPU内核的数量,否则不是关心

  2. 第一个每个请求的时间值4s与请求队列的长度相关 - 1000个请求几乎立即转储到Django上,平均需要4秒才能进行服务,其中约3.4s等待排队。这是由于请求数(100)与处理器数量(16)之间非常大的不匹配导致84个请求在任何时候等待处理器。

  3. 以100的并发运行,测试需要24秒/ 24秒。您有16个进程(线程),因此每个请求处理大约700ms。鉴于您的交易类型,这是每个请求的时间长。这可能是因为:


    1. Django中每个请求的CPU成本很高(由于CPU值较低,这是非常不可能的从调试工具栏)

    2. 操作系统是任务切换很多(特别是如果负载平均值高于4-8),延迟纯属于太许多过程。

    3. 没有足够的数据库连接为16个进程提供服务,因此进程正在等待有一个可用。每个进程至少有一个连接可用?

    4. 数据库周围存在相当大的延迟,罢工>:


      1. 几个小的请求,每个采取,比如说10ms,其中大部分是网络开销。如果是,您是否可以引入缓存或将SQL调用减少到较小的数目。或

      2. 一个或几个请求正在花费100的ms。要检查这一点,请在DB上运行概要分析。如果是这样,您需要优化该请求。



  4. p>系统与用户之间的CPU分配在CPU系统中异常高,尽管CPU总量较低。这意味着Django中的大部分工作都是与内核相关的,如网络或磁盘。在这种情况下,可能是网络成本(例如接收和发送HTTP请求以及向DB发送请求)。有时,由于分页,这会很高。如果没有寻呼,那么你可能根本不用担心。


  5. 您已将进程设置为16,但是具有高负载平均值(您没有声明的高度)。理想情况下,您应该始终至少拥有一个等待CPU的一个进程(以便CPU不会空转)。这里的进程似乎没有CPU限制,但是具有显着的延迟,因此您需要比内核更多的进程。还有多少?尝试使用不同数量的处理器(1,2,4,8,12,16,24等)运行uwsgi,直到达到最佳吞吐量。如果您更改平均进程的延迟时间,则需要再次进行调整。

  6. 500并发级别绝对是一个问题,但它是客户端还是服务器?该报告说,50(100)中的内容长度不正确,这意味着服务器问题。非2xx也似乎指向那里。捕获非2xx响应的调试 - 堆栈跟踪或特定的错误消息将是非常有用的(EDIT),是由uwsgi请求队列运行的默认值为100引起的。 >

总之:




  1. Django似乎很好

  2. 负载测试(100或500)的并发性与进程(16)之间的不匹配:您正在将太多的并发请求推送到系统中以处理的进程数。一旦你超过了进程数量,所有这些都会发生的情况是,你将延长Web服务器中的HTTP请求队列。

  3. 存在很大的延迟时间, p>


    1. 进程(16)和CPU内核(1)之间的不匹配:如果加载平均值> 3,那么可能是太多的进程。再次尝试使用较少数量的进程


      1. 加载平均值> 2 - >尝试8进程

      2. 加载平均值> 4 - >尝试4进程

      3. 加载平均值> 8 - >尝试2进程


    2. 如果负载平均值<3,它可能在DB中,因此配置DB以查看是否存在小量请求(相加导致延迟)或一个或两个SQL语句是问题



  4. 没有捕获失败的响应,很多我可以说500次并发的失败。

开发创意



您的负载平均值> 10在单个核心机器上真的令人讨厌,(正如你所观察到的)导致了很多任务切换和一般的慢行为。我个人不记得看到一个平均负载为19的机器(你有16个进程) - 恭喜你得到这么高的价格;)



数据库性能很棒,所以我现在就给出一个全面的解决方案。



寻呼:回答你关于如何查看分页的问题 - 您可以通过多种方式检测OS分页。例如,在顶部,标题具有页面输入和输出(参见最后一行):

进程:共170个,3个运行,4个卡住, 163睡眠,927线程15:06:31 
加载平均:0.90,1.19,1.94 CPU使用率:1.37%用户,2.97%sys,95.65%空闲SharedLibs:144M居民,0B数据,24M linkedit。
MemRegions:总共31726个,2541M居民,120M私人,817M共享。 PhysMem:1420M有线,3548M有效,1703M无效,6671M使用,1514M免费。
VM:392G vsize,1286M框架vsize,1534241(0)页面,0(0)页面。网络:数据包:789684 / 288M,912863 / 482M出。磁盘:739807 / 15G读取,996745 / 24G写入。

进程数:在当前配置中,进程数是方式太高。 将进程数缩放到2 。我们可能会稍后提高此值,具体取决于进一步负载关闭此服务器。



Apache基准测试的位置:一个进程的负载平均值为1.85,表明您正在同一台机器上运行负载生成器uwsgi - 是正确的吗?



如果是这样,您真的需要从另一台机器运行,否则测试运行不代表实际负载 - 您正在从内存和CPU从Web进程用于负载发生器。此外,负载生成器的100或500线程通常会以不会在现实生活中发生的方式强调您的服务器。确实这可能是整个测试失败的原因。



DB的位置:一个进程的加载平均值也表明您正在运行与Web进程相同的机器上的DB是正确的吗?



如果我对DB是正确的,那么开始缩放的第一个也是最好的方式是将数据库移动到另一台机器。我们这样做有几个原因:


  1. 数据库服务器需要处理节点的不同硬件配置文件:


    1. 磁盘:数据库需要大量快速,冗余,备份的磁盘,处理节点只需要一个基本磁盘
    2. $ b $ CPU:处理节点需要能够承受的最快CPU,而DB机器通常可以做到没有(通常它的性能在磁盘和RAM上)
    3. RAM:a DB机器通常需要尽可能多的RAM(并且最快的数据库在RAM中具有其数据),而许多处理节点需要的少得多(每个进程需要大约20MB - 非常小的

    4. 缩放比例: Atomic 通过使用多个CPU的怪兽机器来最大化DB,而Web层(没有状态)可以通过插入许多相同的小盒子来缩放。


  2. CPU亲和力:CPU的负载平均值为1.0,进程与si的关联性更好ngle核心这样做最大限度地增加了CPU缓存的使用,并最大限度地减少了任务切换开销。通过分离数据库和处理节点,您将在HW中强制执行此关联。


上图中的请求队列最多为100个 - 如果uwsgi在队列满时收到请求,则该请求被拒绝与5xx错误。我认为这是在你的500并发负载测试中发生的 - 基本上队列中填满了前100个线程,然后其他400个线程发出剩余的900个请求,并立即收到5xx错误。



为了处理每秒500个请求,您需要确保两件事:


  1. 请求队列大小被配置为处理突发:使用 - listen 参数 uwsgi

  2. 如果500是正常条件,系统可以处理每秒500个请求的吞吐量,如果500是峰值,则系统可以处理一个低于500个请求的吞吐量。

我想像,uwsgi将队列设置为较小的数字,以更好地处理DDoS攻击;如果放置在巨大的负载下,大多数请求立即失败,几乎没有处理,允许整个框仍然响应管理员。



扩展系统的一般建议



您最重要的考虑可能是最大化吞吐量。另一个可能需要尽量减少响应时间,但我不会在这里讨论。在最大化吞吐量的同时,您正在尝试最大化系统,而不是单个组件;一些本地的减少可能会提高整体系统吞吐量(例如,为了提高DB的性能,进行更改,以增加Web层的延迟是一个净收益)。



详细信息:


  1. 将数据库移动到单独的计算机。之后,通过运行 top 和您最喜爱的MySQL监视工具,在加载测试期间配置DB。您需要能够配置文件。将数据库移动到单独的机器将会为每个请求引入一些额外的延迟(几ms),因此预期会稍微增加Web层上的进程数以保持相同的吞吐量。

  2. 确保 uswgi 请求队列足够大,可以使用 - listen 参数来处理一连串的流量。这应该是您的系统可以处理的每秒最大稳态请求的几倍。

  3. 在Web /应用程序层:平衡进程数量CPU内核的数量和进程中的固有延迟。太多的进程会降低性能,太少意味着你永远不会充分利用系统资源。没有固定的平衡点,因为每个应用程序和使用模式都不同,所以进行基准和调整。作为指导,使用进程的延迟,如果每个任务都有:




    • 0%的延迟,那么每个核心需要1个进程

    • 50%的延迟时间(即CPU时间是实际时间的一半),那么每个核心需要2个进程(
    • 67%的延迟),那么需要3个每个核心的流程


  4. 在测试期间检查 top 以确保您的CPU利用率高于90%(对于每个核心),您的负载平均值在1.0以上。如果负载平均值较高,则缩小流程。如果一切顺利,在某些时候您将无法实现这一目标,而DB现在可能成为瓶颈。


  5. 在某些时候,您需要更多的权力网络层。您可以选择向机器添加更多的CPU(相对容易),因此添加更多进程,和/或可以添加更多的处理节点(水平扩展性)。后者可以通过此处所述的方法在uwsgi中实现用户/ 1154047 / ukasz-mierzwa>ŁukaszMierzwa


I am running a django app with nginx & uwsgi. Here's how i run uwsgi:

sudo uwsgi -b 25000 --chdir=/www/python/apps/pyapp --module=wsgi:application --env DJANGO_SETTINGS_MODULE=settings --socket=/tmp/pyapp.socket --cheaper=8 --processes=16  --harakiri=10  --max-requests=5000  --vacuum --master --pidfile=/tmp/pyapp-master.pid --uid=220 --gid=499

& nginx configurations:

server {
    listen 80;
    server_name test.com

    root /www/python/apps/pyapp/;

    access_log /var/log/nginx/test.com.access.log;
    error_log /var/log/nginx/test.com.error.log;

    # https://docs.djangoproject.com/en/dev/howto/static-files/#serving-static-files-in-production
    location /static/ {
        alias /www/python/apps/pyapp/static/;
        expires 30d;
    }

    location /media/ {
        alias /www/python/apps/pyapp/media/;
        expires 30d;
    }

    location / {
        uwsgi_pass unix:///tmp/pyapp.socket;
        include uwsgi_params;
        proxy_read_timeout 120;
    }

    # what to serve if upstream is not available or crashes
    #error_page 500 502 503 504 /media/50x.html;
}

Here comes the problem. When doing "ab" (ApacheBenchmark) on the server i get the following results:

nginx version: nginx version: nginx/1.2.6

uwsgi version:1.4.5

Server Software:        nginx/1.0.15
Server Hostname:        pycms.com
Server Port:            80

Document Path:          /api/nodes/mostviewed/8/?format=json
Document Length:        8696 bytes

Concurrency Level:      100
Time taken for tests:   41.232 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      8866000 bytes
HTML transferred:       8696000 bytes
Requests per second:    24.25 [#/sec] (mean)
Time per request:       4123.216 [ms] (mean)
Time per request:       41.232 [ms] (mean, across all concurrent requests)
Transfer rate:          209.99 [Kbytes/sec] received

While running on 500 concurrency level

oncurrency Level:      500
Time taken for tests:   2.175 seconds
Complete requests:      1000
Failed requests:        50
   (Connect: 0, Receive: 0, Length: 50, Exceptions: 0)
Write errors:           0
Non-2xx responses:      950
Total transferred:      629200 bytes
HTML transferred:       476300 bytes
Requests per second:    459.81 [#/sec] (mean)
Time per request:       1087.416 [ms] (mean)
Time per request:       2.175 [ms] (mean, across all concurrent requests)
Transfer rate:          282.53 [Kbytes/sec] received

As you can see... all requests on the server fail with either timeout errors or "Client prematurely disconnected" or:

writev(): Broken pipe [proto/uwsgi.c line 124] during GET /api/nodes/mostviewed/9/?format=json

Here's a little bit more about my application: Basically, it's a collection of models that reflect MySQL tables which contain all the content. At the frontend, i have django-rest-framework which serves json content to the clients.

I've installed django-profiling & django debug toolbar to see whats going on. On django-profiling here's what i get when running a single request:

Instance wide RAM usage

Partition of a set of 147315 objects. Total size = 20779408 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  63960  43  5726288  28   5726288  28 str
     1  36887  25  3131112  15   8857400  43 tuple
     2   2495   2  1500392   7  10357792  50 dict (no owner)
     3    615   0  1397160   7  11754952  57 dict of module
     4   1371   1  1236432   6  12991384  63 type
     5   9974   7  1196880   6  14188264  68 function
     6   8974   6  1076880   5  15265144  73 types.CodeType
     7   1371   1  1014408   5  16279552  78 dict of type
     8   2684   2   340640   2  16620192  80 list
     9    382   0   328912   2  16949104  82 dict of class
<607 more rows. Type e.g. '_.more' to view.>



CPU Time for this request

         11068 function calls (10158 primitive calls) in 0.064 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.064    0.064 /usr/lib/python2.6/site-packages/django/views/generic/base.py:44(view)
        1    0.000    0.000    0.064    0.064 /usr/lib/python2.6/site-packages/django/views/decorators/csrf.py:76(wrapped_view)
        1    0.000    0.000    0.064    0.064 /usr/lib/python2.6/site-packages/rest_framework/views.py:359(dispatch)
        1    0.000    0.000    0.064    0.064 /usr/lib/python2.6/site-packages/rest_framework/generics.py:144(get)
        1    0.000    0.000    0.064    0.064 /usr/lib/python2.6/site-packages/rest_framework/mixins.py:46(list)
        1    0.000    0.000    0.038    0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:348(data)
     21/1    0.000    0.000    0.038    0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:273(to_native)
     21/1    0.000    0.000    0.038    0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:190(convert_object)
     11/1    0.000    0.000    0.036    0.036 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:303(field_to_native)
    13/11    0.000    0.000    0.033    0.003 /usr/lib/python2.6/site-packages/django/db/models/query.py:92(__iter__)
      3/1    0.000    0.000    0.033    0.033 /usr/lib/python2.6/site-packages/django/db/models/query.py:77(__len__)
        4    0.000    0.000    0.030    0.008 /usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py:794(execute_sql)
        1    0.000    0.000    0.021    0.021 /usr/lib/python2.6/site-packages/django/views/generic/list.py:33(paginate_queryset)
        1    0.000    0.000    0.021    0.021 /usr/lib/python2.6/site-packages/django/core/paginator.py:35(page)
        1    0.000    0.000    0.020    0.020 /usr/lib/python2.6/site-packages/django/core/paginator.py:20(validate_number)
        3    0.000    0.000    0.020    0.007 /usr/lib/python2.6/site-packages/django/core/paginator.py:57(_get_num_pages)
        4    0.000    0.000    0.020    0.005 /usr/lib/python2.6/site-packages/django/core/paginator.py:44(_get_count)
        1    0.000    0.000    0.020    0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:340(count)
        1    0.000    0.000    0.020    0.020 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:394(get_count)
        1    0.000    0.000    0.020    0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:568(_prefetch_related_objects)
        1    0.000    0.000    0.020    0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:1596(prefetch_related_objects)
        4    0.000    0.000    0.020    0.005 /usr/lib/python2.6/site-packages/django/db/backends/util.py:36(execute)
        1    0.000    0.000    0.020    0.020 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:340(get_aggregation)
        5    0.000    0.000    0.020    0.004 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:136(execute)
        2    0.000    0.000    0.020    0.010 /usr/lib/python2.6/site-packages/django/db/models/query.py:1748(prefetch_one_level)
        4    0.000    0.000    0.020    0.005 /usr/lib/python2.6/site-packages/django/db/backends/mysql/base.py:112(execute)
        5    0.000    0.000    0.019    0.004 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:316(_query)
       60    0.000    0.000    0.018    0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:231(iterator)
        5    0.012    0.002    0.015    0.003 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:278(_do_query)
       60    0.000    0.000    0.013    0.000 /usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py:751(results_iter)
       30    0.000    0.000    0.010    0.000 /usr/lib/python2.6/site-packages/django/db/models/manager.py:115(all)
       50    0.000    0.000    0.009    0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:870(_clone)
       51    0.001    0.000    0.009    0.000 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:235(clone)
        4    0.000    0.000    0.009    0.002 /usr/lib/python2.6/site-packages/django/db/backends/__init__.py:302(cursor)
        4    0.000    0.000    0.008    0.002 /usr/lib/python2.6/site-packages/django/db/backends/mysql/base.py:361(_cursor)
        1    0.000    0.000    0.008    0.008 /usr/lib64/python2.6/site-packages/MySQLdb/__init__.py:78(Connect)
  910/208    0.003    0.000    0.008    0.000 /usr/lib64/python2.6/copy.py:144(deepcopy)
       22    0.000    0.000    0.007    0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:619(filter)
       22    0.000    0.000    0.007    0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:633(_filter_or_exclude)
       20    0.000    0.000    0.005    0.000 /usr/lib/python2.6/site-packages/django/db/models/fields/related.py:560(get_query_set)
        1    0.000    0.000    0.005    0.005 /usr/lib64/python2.6/site-packages/MySQLdb/connections.py:8()

..etc

However, django-debug-toolbar shows the following:

Resource Usage
Resource    Value
User CPU time   149.977 msec
System CPU time 119.982 msec
Total CPU time  269.959 msec
Elapsed time    326.291 msec
Context switches    11 voluntary, 40 involuntary

and 5 queries in 27.1 ms

The problem is that "top" shows the load average rising quickly and apache benchmark which i ran both on the local server and from a remote machine within the network shows that i am not serving many requests / second. What is the problem? this is as far as i could reach when profiling the code so it would be appreciated if someone can point of what i am doing here.

Edit (23/02/2013): Adding more details based on Andrew Alcock's answer: The points that require my attention / answer are (3)(3) I've executed "show global variables" on MySQL and found out that MySQL configurations had 151 for max_connections setting which is more than enough to serve the workers i am starting for uwsgi.

(3)(4)(2) The single request i am profiling is the heaviest one. It executes 4 queries according to django-debug-toolbar. What happens is that all queries run in: 3.71, 2.83, 0.88, 4.84 ms respectively.

(4) Here you're referring to memory paging? if so, how could i tell?

(5) On 16 workers, 100 concurrency rate, 1000 requests the load average goes up to ~ 12 I ran the tests on different number of workers (concurrency level is 100):

  1. 1 worker, load average ~ 1.85, 19 reqs / second, Time per request: 5229.520, 0 non-2xx
  2. 2 worker, load average ~ 1.5, 19 reqs / second, Time per request: 516.520, 0 non-2xx
  3. 4 worker, load average ~ 3, 16 reqs / second, Time per request: 5929.921, 0 non-2xx
  4. 8 worker, load average ~ 5, 18 reqs / second, Time per request: 5301.458, 0 non-2xx
  5. 16 worker, load average ~ 19, 15 reqs / second, Time per request: 6384.720, 0 non-2xx

AS you can see, the more workers we have, the more load we have on the system. I can see in uwsgi's daemon log that the response time in milliseconds increases when i increase the number of workers.

On 16 workers, running 500 concurrency level requests uwsgi starts loggin the errors:

 writev(): Broken pipe [proto/uwsgi.c line 124] 

Load goes up to ~ 10 as well. and the tests don't take much time because non-2xx responses are 923 out of 1000 which is why the response here is quite fast as it's almost empty. Which is also a reply to your point #4 in the summary.

Assuming that what i am facing here is an OS latency based on I/O and networking, what is the recommended action to scale this up? new hardware? bigger server?

Thanks

解决方案

EDIT 1 Seen the comment that you have 1 virtual core, adding commentary through on all relavant points

EDIT 2 More information from Maverick, so I'm eliminating ideas ruled out and developing the confirmed issues.

EDIT 3 Filled out more details about uwsgi request queue and scaling options. Improved grammar.

EDIT 4 Updates from Maverick and minor improvements

Comments are too small, so here are some thoughts:

  1. Load average is basically how many processes are running on or waiting for CPU attention. For a perfectly loaded system with 1 CPU core, the load average should be 1.0; for a 4 core system, it should be 4.0. The moment you run the web test, the threading rockets and you have a lot of processes waiting for CPU. Unless the load average exceeds the number of CPU cores by a significant margin, it is not a concern
  2. The first 'Time per request' value of 4s correlates to the length of the request queue - 1000 requests dumped on Django nearly instantaneously and took on average 4s to service, about 3.4s of which were waiting in a queue. This is due to the very heavy mismatch between the number of requests (100) vs. the number of processors (16) causing 84 of the requests to be waiting for a processor at any one moment.
  3. Running at a concurrency of 100, the tests take 41 seconds at 24 requests/sec. You have 16 processes (threads), so each request is processed about 700ms. Given your type of transaction, that is a long time per request. This may be because:

    1. The CPU cost of each request is high in Django (which is highly unlikely given the low CPU value from the debug toolbar)
    2. The OS is task switching a lot (especially if the load average is higher than 4-8), and the latency is purely down to having too many processes.
    3. There are not enough DB connections serving the 16 processes so processes are waiting to have one come available. Do you have at least one connection available per process?
    4. There is considerable latency around the DB, either:

      1. Tens of small requests each taking, say, 10ms, most of which is networking overhead. If so, can you introducing caching or reduce the SQL calls to a smaller number. Or
      2. One or a couple of requests are taking 100's of ms. To check this, run profiling on the DB. If so, you need to optimise that request.

  4. The split between system and user CPU cost is unusually high in system, although the total CPU is low. This implies that most of the work in Django is kernel related, such as networking or disk. In this scenario, it might be network costs (eg receiving and sending HTTP requests and receiving and sending requests to the DB). Sometimes this will be high because of paging. If there's no paging going on, then you probably don't have to worry about this at all.

  5. You have set the processes at 16, but have a high load average (how high you don't state). Ideally you should always have at least one process waiting for CPU (so that CPUs don't spin idly). Processes here don't seem CPU bound, but have a significant latency, so you need more processes than cores. How many more? Try running the uwsgi with different numbers of processors (1, 2, 4, 8, 12, 16, 24, etc) until you have the best throughput. If you change latency of the average process, you will need to adjust this again.
  6. The 500 concurrency level definitely is a problem, but is it the client or the server? The report says 50 (out of 100) had the incorrect content-length which implies a server problem. The non-2xx also seems to point there. Is it possible to capture the non-2xx responses for debugging - stack traces or the specific error message would be incredibly useful (EDIT) and is caused by the uwsgi request queue running with it's default value of 100.

So, in summary:

  1. Django seems fine
  2. Mismatch between concurrency of load test (100 or 500) vs. processes (16): You're pushing way too many concurrent requests into the system for the number of processes to handle. Once you are above the number of processes, all that will happen is that you will lengthen the HTTP Request queue in the web server
  3. There is a large latency, so either

    1. Mismatch between processes (16) and CPU cores (1): If the load average is >3, then it's probably too many processes. Try again with a smaller number of processes

      1. Load average > 2 -> try 8 processes
      2. Load average > 4 -> try 4 processes
      3. Load average > 8 -> try 2 processes

    2. If the load average <3, it may be in the DB, so profile the DB to see whether there are loads of small requests (additively causing the latency) or one or two SQL statements are the problem

  4. Without capturing the failed response, there's not much I can say about the failures at 500 concurrency

Developing ideas

Your load averages >10 on a single cored machine is really nasty and (as you observe) leads to a lot of task switching and general slow behaviour. I personally don't remember seeing a machine with a load average of 19 (which you have for 16 processes) - congratulations for getting it so high ;)

The DB performance is great, so I'd give that an all-clear right now.

Paging: To answer you question on how to see paging - you can detect OS paging in several ways. For example, in top, the header has page-ins and outs (see the last line):

Processes: 170 total, 3 running, 4 stuck, 163 sleeping, 927 threads                                                                                                        15:06:31
Load Avg: 0.90, 1.19, 1.94  CPU usage: 1.37% user, 2.97% sys, 95.65% idle  SharedLibs: 144M resident, 0B data, 24M linkedit.
MemRegions: 31726 total, 2541M resident, 120M private, 817M shared. PhysMem: 1420M wired, 3548M active, 1703M inactive, 6671M used, 1514M free.
VM: 392G vsize, 1286M framework vsize, 1534241(0) pageins, 0(0) pageouts. Networks: packets: 789684/288M in, 912863/482M out. Disks: 739807/15G read, 996745/24G written.

Number of processes: In your current configuration, the number of processes is way too high. Scale the number of processes back to a 2. We might bring this value up later, depending on shifting further load off this server.

Location of Apache Benchmark: The load average of 1.85 for one process suggests to me that you are running the load generator on the same machine as uwsgi - is that correct?

If so, you really need to run this from another machine otherwise the test runs are not representative of actual load - you're taking memory and CPU from the web processes for use in the load generator. In addition, the load generator's 100 or 500 threads will generally stress your server in a way that does not happen in real life. Indeed this might be the reason the whole test fails.

Location of the DB: The load average for one process also suggest that you are running the DB on the same machine as the web processes - is this correct?

If I'm correct about the DB, then the first and best way to start scaling is to move the DB to another machine. We do this for a couple of reasons:

  1. A DB server needs a different hardware profile from a processing node:

    1. Disk: DB needs a lot of fast, redundant, backed up disk, and a processing node needs just a basic disk
    2. CPU: A processing node needs the fastest CPU you can afford whereas a DB machine can often make do without (often its performance is gated on disk and RAM)
    3. RAM: a DB machine generally needs as much RAM as possible (and the fastest DB has all its data in RAM), whereas many processing nodes need much less (yours needs about 20MB per process - very small
    4. Scaling: Atomic DBs scale best by having monster machines with many CPUs whereas the web tier (not having state) can scale by plugging in many identical small boxen.

  2. CPU affinity: It's better for the CPU to have a load average of 1.0 and processes to have affinity to a single core. Doing so maximizes the use of the CPU cache and minimizes task switching overheads. By separating the DB and processing nodes, you are enforcing this affinity in HW.

500 concurrency with exceptions The request queue in the diagram above is at most 100 - if uwsgi receives a request when the queue is full, the request is rejected with a 5xx error. I think this was happening in your 500 concurrency load test - basically the queue filled up with the first 100 or so threads, then the other 400 threads issued the remaining 900 requests and received immediate 5xx errors.

To handle 500 requests per second you need to ensure two things:

  1. The Request Queue size is configured to handle the burst: Use the --listen argument to uwsgi
  2. The system can handle a throughput at above 500 requests per second if 500 is a normal condition, or a bit below if 500 is a peak. See scaling notes below.

I imagine that uwsgi has the queue set to a smaller number to better handle DDoS attacks; if placed under huge load, most requests immediately fail with almost no processing allowing the box as a whole to still be responsive to the administrators.

General advice for scaling a system

Your most important consideration is probably to maximize throughput. Another possible need to minimize response time, but I won't discuss this here. In maximising throughput, you are trying to maximize the system, not individual components; some local decreases might improve overall system throughput (for example, making a change that happens to add latency in the web tier in order to improve performance of the DB is a net gain).

Onto specifics:

  1. Move the DB to a separate machine. After this, profile the DB during your load test by running top and your favorite MySQL monitoring tool. You need to be able to profile . Moving the DB to a separate machine will introduce some additional latency (several ms) per request, so expect to slightly increase the number of processes at the web tier to keep the same throughput.
  2. Ensure that uswgi request queue is large enough to handle a burst of traffic using the --listen argument. This should be several times the maximum steady-state requests-per-second your system can handle.
  3. On the web/app tier: Balance the number of processes with the number of CPU cores and the inherent latency in the process. Too many processes slows performance, too few means that you'll never fully utilize the system resources. There is no fixed balancing point, as every application and usage pattern is different, so benchmark and adjust. As a guide, use the processes' latency, if each task has:

    • 0% latency, then you need 1 process per core
    • 50% latency (i.e. the CPU time is half the actual time), then you need 2 processes per core
    • 67% latency, then you need 3 processes per core
  4. Check top during the test to ensure that you are above 90% cpu utilisation (for every core) and you have a load average a little above 1.0. If the load average is higher, scale back the processes. If all goes well, at some point you won't be able to achieve this target, and DB might now be the bottleneck

  5. At some point you will need more power in the web tier. You can either choose to add more CPU to the machine (relatively easy) and so add more processes, and/or you can add in more processing nodes (horizontal scaleability). The latter can be achieved in uwsgi using the method discussed here by Łukasz Mierzwa

这篇关于坏Django / uwsgi表演的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆