此cProfile结果告诉我需要修复什么? [英] What is this cProfile result telling me I need to fix?

查看:90
本文介绍了此cProfile结果告诉我需要修复什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提高Python脚本的性能,并一直使用cProfile生成性能报告:

I would like to improve the performance of a Python script and have been using cProfile to generate a performance report:

python -m cProfile -o chrX.prof ./bgchr.py ...args...

我用Python的pstats打开了这个chrX.prof文件,并打印了统计信息:

I opened this chrX.prof file with Python's pstats and printed out the statistics:

Python 2.7 (r27:82500, Oct  5 2010, 00:24:22) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pstats
>>> p = pstats.Stats('chrX.prof')
>>> p.sort_stats('name')
>>> p.print_stats()                                                                                                                                                                                                                        
Sun Oct 10 00:37:30 2010    chrX.prof                                                                                                                                                                                                      

         8760583 function calls in 13.780 CPU seconds                                                                                                                                                                                      

   Ordered by: function name                                                                                                                                                                                                               

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)                                                                                                                                                                    
        1    0.000    0.000    0.000    0.000 {_locale.setlocale}                                                                                                                                                                          
        1    1.128    1.128    1.128    1.128 {bz2.decompress}                                                                                                                                                                             
        1    0.002    0.002   13.780   13.780 {execfile}                                                                                                                                                                                   
  1750678    0.300    0.000    0.300    0.000 {len}                                                                                                                                                                                        
       48    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}                                                                                                                                                          
        1    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}                                                                                                                                                           
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}                                                                                                                                             
  1750676    0.496    0.000    0.496    0.000 {method 'join' of 'str' objects}                                                                                                                                                             
        1    0.007    0.007    0.007    0.007 {method 'read' of 'file' objects}                                                                                                                                                            
        1    0.000    0.000    0.000    0.000 {method 'readlines' of 'file' objects}                                                                                                                                                       
        1    0.034    0.034    0.034    0.034 {method 'rstrip' of 'str' objects}                                                                                                                                                           
       23    0.000    0.000    0.000    0.000 {method 'seek' of 'file' objects}                                                                                                                                                            
  1757785    1.230    0.000    1.230    0.000 {method 'split' of 'str' objects}                                                                                                                                                            
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}                                                                                                                                                       
  1750676    0.872    0.000    0.872    0.000 {method 'write' of 'file' objects}                                                                                                                                                           
        1    0.007    0.007   13.778   13.778 ./bgchr:3(<module>)                                                                                                                                                                          
        1    0.000    0.000   13.780   13.780 <string>:1(<module>)                                                                                                                                                                         
        1    0.001    0.001    0.001    0.001 {open}                                                                                                                                                                                       
        1    0.000    0.000    0.000    0.000 {sys.exit}                                                                                                                                                                                   
        1    0.000    0.000    0.000    0.000 ./bgchr:36(checkCommandLineInputs)                                                                                                                                                           
        1    0.000    0.000    0.000    0.000 ./bgchr:27(checkInstallation)                                                                                                                                                                
        1    1.131    1.131   13.701   13.701 ./bgchr:97(extractData)                                                                                                                                                                      
        1    0.003    0.003    0.007    0.007 ./bgchr:55(extractMetadata)                                                                                                                                                                  
        1    0.064    0.064   13.771   13.771 ./bgchr:5(main)                                                                                                                                                                              
  1750677    8.504    0.000   11.196    0.000 ./bgchr:122(parseJarchLine)                                                                                                                                                                  
        1    0.000    0.000    0.000    0.000 ./bgchr:72(parseMetadata)                                                                                                                                                                    
        1    0.000    0.000    0.000    0.000 /home/areynolds/proj/tools/lib/python2.7/locale.py:517(setlocale) 

问题:我应该如何处理joinsplitwrite操作,以减少它们对脚本性能的明显影响?

Question: What can I do about join, split and write operations to reduce the apparent impact they have on the performance of this script?

如果相关,这是有问题的脚本的完整源代码:

If it is relevant, here is the full source code to the script in question:

#!/usr/bin/env python

import sys, os, time, bz2, locale

def main(*args):
    # Constants
    global metadataRequiredFileSize
    metadataRequiredFileSize = 8192
    requiredVersion = (2,5)

    # Prep
    global whichChromosome
    whichChromosome = "all"
    checkInstallation(requiredVersion)
    checkCommandLineInputs()
    extractMetadata()
    parseMetadata()
    if whichChromosome == "--list":
        listMetadata()
        sys.exit(0)

    # Extract
    extractData()   
    return 0

def checkInstallation(rv):
    currentVersion = sys.version_info
    if currentVersion[0] == rv[0] and currentVersion[1] >= rv[1]:
        pass
    else:
        sys.stderr.write( "\n\t[%s] - Error: Your Python interpreter must be %d.%d or greater (within major version %d)\n" % (sys.argv[0], rv[0], rv[1], rv[0]) )
        sys.exit(-1)
    return

def checkCommandLineInputs():
    cmdName = sys.argv[0]
    argvLength = len(sys.argv[1:])
    if (argvLength == 0) or (argvLength > 2):
        sys.stderr.write( "\n\t[%s] - Usage: %s [<chromosome> | --list] <bjarch-file>\n\n" % (cmdName, cmdName) )
        sys.exit(-1)
    else:   
        global inFile
        global whichChromosome
        if argvLength == 1:
            inFile = sys.argv[1]
        elif argvLength == 2:
            whichChromosome = sys.argv[1]
            inFile = sys.argv[2]
        if inFile == "-" or inFile == "--list":
            sys.stderr.write( "\n\t[%s] - Usage: %s [<chromosome> | --list] <bjarch-file>\n\n" % (cmdName, cmdName) )
            sys.exit(-1)
    return

def extractMetadata():
    global metadataList
    global dataHandle
    metadataList = []
    dataHandle = open(inFile, 'rb')
    try:
        for data in dataHandle.readlines(metadataRequiredFileSize):     
            metadataLine = data
            metadataLines = metadataLine.split('\n')
            for line in metadataLines:      
                if line:
                    metadataList.append(line)
    except IOError:
        sys.stderr.write( "\n\t[%s] - Error: Could not extract metadata from %s\n\n" % (sys.argv[0], inFile) )
        sys.exit(-1)
    return

def parseMetadata():
    global metadataList
    global metadata
    metadata = []
    if not metadataList: # equivalent to "if len(metadataList) > 0"
        sys.stderr.write( "\n\t[%s] - Error: No metadata in %s\n\n" % (sys.argv[0], inFile) )
        sys.exit(-1)
    for entryText in metadataList:
        if entryText: # equivalent to "if len(entryText) > 0"
            entry = entryText.split('\t')
            filename = entry[0]
            chromosome = entry[0].split('.')[0]
            size = entry[1]
            entryDict = { 'chromosome':chromosome, 'filename':filename, 'size':size }
            metadata.append(entryDict)
    return

def listMetadata():
    for index in metadata:
        chromosome = index['chromosome']
        filename = index['filename']
        size = long(index['size'])
        sys.stdout.write( "%s\t%s\t%ld" % (chromosome, filename, size) )
    return

def extractData():
    global dataHandle
    global pLength
    global lastEnd
    locale.setlocale(locale.LC_ALL, 'POSIX')
    dataHandle.seek(metadataRequiredFileSize, 0) # move cursor past metadata
    for index in metadata:
        chromosome = index['chromosome']
        size = long(index['size'])
        pLength = 0L
        lastEnd = ""
        if whichChromosome == "all" or whichChromosome == index['chromosome']:
            dataStream = dataHandle.read(size)
            uncompressedData = bz2.decompress(dataStream)
            lines = uncompressedData.rstrip().split('\n')
            for line in lines:
                parseJarchLine(chromosome, line)
            if whichChromosome == chromosome:
                break
        else:
            dataHandle.seek(size, 1) # move cursor past chromosome chunk

    dataHandle.close()
    return

def parseJarchLine(chromosome, line):
    global pLength
    global lastEnd
    elements = line.split('\t')
    if len(elements) > 1:
        if lastEnd:
            start = long(lastEnd) + long(elements[0])
            lastEnd = long(start + pLength)
            sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:])))
        else:
            lastEnd = long(elements[0]) + long(pLength)
            sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, long(elements[0]), lastEnd, '\t'.join(elements[1:])))
    else:
        if elements[0].startswith('p'):
            pLength = long(elements[0][1:])
        else:
            start = long(long(lastEnd) + long(elements[0]))
            lastEnd = long(start + pLength)
            sys.stdout.write("%s\t%ld\t%ld\n" % (chromosome, start, lastEnd))               
    return

if __name__ == '__main__':
    sys.exit(main(*sys.argv))

编辑

如果我在parseJarchLine()的第一个条件中注释掉sys.stdout.write语句,那么我的运行时间将从10.2秒变为4.8秒:

If I comment out the sys.stdout.write statement in the first conditional of parseJarchLine(), then my runtime goes from 10.2 sec to 4.8 sec:

# with first conditional's "sys.stdout.write" enabled
$ time ./bgchr chrX test.bjarch > /dev/null
real    0m10.186s                                                                                                                                                                                        
user    0m9.917s                                                                                                                                                                                         
sys 0m0.160s  

# after first conditional's "sys.stdout.write" is commented out                                                                                                                                                                                           
$ time ./bgchr chrX test.bjarch > /dev/null
real    0m4.808s                                                                                                                                                                                         
user    0m4.561s                                                                                                                                                                                         
sys 0m0.156s

在Python中写stdout真的很贵吗?

Is writing to stdout really that expensive in Python?

推荐答案

ncalls仅在将数字与其他计数(例如文件中的字符/字段/行数)进行比较可能会引起异常的程度方面才有意义. tottimecumtime才是真正重要的. cumtime是花费在函数/方法上的时间,包括花费在它调用的函数/方法上的时间; tottime是在函数/方法中花费的时间不包括在其调用的函数/方法中花费的时间.

ncalls is relevant only to the extent that comparing the numbers against other counts such as number of chars/fields/lines in a file may highligh anomalies; tottime and cumtime is what really matters. cumtime is the time spent in the function/method including the time spent in the functions/methods that it calls; tottime is the time spent in the function/method excluding the time spent in the functions/methods that it calls.

我发现对tottime上的统计数据进行排序,然后对cumtime而不是name上的统计数据进行排序很有帮助.

I find it helpful to sort the stats on tottime and again on cumtime, not on name.

bgchar 肯定地是指脚本的执行,并且与它无关,因为它占用了13.5分中的8.9秒; 8.9秒不包括其调用的函数/方法中的时间!仔细阅读@Lie Ryan关于将脚本模块化为函数的说法,并实现他的建议. @jonesy说的也一样.

bgchar definitely refers to the execution of the script and is not irrelevant as it takes up 8.9 seconds out of 13.5; that 8.9 seconds does NOT include time in the functions/methods that it calls! Read carefully what @Lie Ryan says about modularising your script into functions, and implement his advice. Likewise what @jonesy says.

string是因为您import string并仅在一个地方使用它:string.find(elements[0], 'p').在输出的另一行,您会注意到string.find仅被调用一次,因此在此脚本运行中这不是性能问题.但是,您在其他任何地方都使用str方法. string函数如今已被弃用,并通过调用相应的str方法来实现.您最好写一个elements[0].find('p') == 0来获得精确但更快的等价内容,并且可能想使用elements[0].startswith('p')来使读者想知道== 0是否应该实际上是== -1.

string is mentioned because you import string and use it in only one place: string.find(elements[0], 'p'). On another line in the output you'll notice that string.find was called only once, so it's not a performance problem in this run of this script. HOWEVER: You use str methods everywhere else. string functions are deprecated nowadays and are implemented by calling the corresponding str method. You would be better writing elements[0].find('p') == 0 for an exact but faster equivalent, and might like to use elements[0].startswith('p') which would save readers wondering whether that == 0 should actually be == -1.

@Bernd Petersohn提到的四种方法仅花费3.7秒,而总执行时间为13.541秒.在不必担心这些之前,请将您的脚本模块化为函数,再次运行cProfile,然后按tottime对统计信息进行排序.

The four methods mentioned by @Bernd Petersohn take up only 3.7 seconds out of a total execution time of 13.541 seconds. Before worrying too much about those, modularise your script into functions, run cProfile again, and sort the stats by tottime.

问题更新后,脚本已更改:

"问题:我应该如何处理联接,拆分和写入操作,以减少它们对该脚本性能的明显影响?""

"""Question: What can I do about join, split and write operations to reduce the apparent impact they have on the performance of this script?""

嗯?这3个合计耗时2.6秒,占总数的13.8.您的parseJarchLine函数耗时8.5秒(其中不包括其调用的函数/方法所花费的时间.assert(8.5 > 2.6)

Huh? Those 3 together take 2.6 seconds out of the total of 13.8. Your parseJarchLine function is taking 8.5 seconds (which doesn't include time taken by functions/methods that it calls. assert(8.5 > 2.6)

伯恩德(Bernd)已经为您指出了您可能会考虑使用的解决方案.您不必完全分割线,而仅在写出时将其重新连接起来.您只需要检查第一个元素.代替elements = line.split('\t')进行elements = line.split('\t', 1)并将'\t'.join(elements[1:])替换为elements[1].

Bernd has already pointed you at what you might consider doing with those. You are needlessly splitting the line completely only to join it up again when writing it out. You need to inspect only the first element. Instead of elements = line.split('\t') do elements = line.split('\t', 1) and replace '\t'.join(elements[1:]) by elements[1].

现在,让我们深入分析parseJarchLine的主体. long内置函数在来源中的使用数量和使用方式令人惊讶.同样令人惊讶的是,在cProfile输出中未提及long.

Now let's dive into the body of parseJarchLine. The number of uses in the source and manner of the uses of the long built-in function are astonishing. Also astonishing is the fact that long is not mentioned in the cProfile output.

您为什么完全需要long?文件超过2 Gb?好的,那么您需要考虑到由于Python 2.2,int溢出会导致提升为long而不是引发异常.您可以利用int算术的更快执行速度.您还需要考虑的是,当已经证明x已经是long时执行long(x)就是浪费资源.

Why do you need long at all? Files over 2 Gb? OK, then you need to consider that since Python 2.2, int overflow causes promotion to long instead of raising an exception. You can take advantage of faster execution of int arithmetic. You also need to consider that doing long(x) when x is already demonstrably a long is a waste of resources.

这是parseJarchLine函数,带有标记为[1]的删除废物更改和标记为[2]的从int更改.好主意:一小步进行更改,重新测试,重新配置.

Here is the parseJarchLine function with removing-waste changes marked [1] and changing-to-int changes marked [2]. Good idea: make changes in small steps, re-test, re-profile.

def parseJarchLine(chromosome, line):
    global pLength
    global lastEnd
    elements = line.split('\t')
    if len(elements) > 1:
        if lastEnd != "":
            start = long(lastEnd) + long(elements[0])
            # [1] start = lastEnd + long(elements[0])
            # [2] start = lastEnd + int(elements[0])
            lastEnd = long(start + pLength)
            # [1] lastEnd = start + pLength
            sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:])))
        else:
            lastEnd = long(elements[0]) + long(pLength)
            # [1] lastEnd = long(elements[0]) + pLength
            # [2] lastEnd = int(elements[0]) + pLength
            sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, long(elements[0]), lastEnd, '\t'.join(elements[1:])))
    else:
        if elements[0].startswith('p'):
            pLength = long(elements[0][1:])
            # [2] pLength = int(elements[0][1:])
        else:
            start = long(long(lastEnd) + long(elements[0]))
            # [1] start = lastEnd + long(elements[0])
            # [2] start = lastEnd + int(elements[0])
            lastEnd = long(start + pLength)
            # [1] lastEnd = start + pLength
            sys.stdout.write("%s\t%ld\t%ld\n" % (chromosome, start, lastEnd))               
    return

有关问题sys.stdout.write

Update after question about sys.stdout.write

如果您注释掉的陈述与原始陈述类似:

If the statement that you commented out was anything like the original one:

sys.stdout.write("%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:])))

那么您的问题很有趣.试试这个:

Then your question is ... interesting. Try this:

payload = "%s\t%ld\t%ld\t%s\n" % (chromosome, start, lastEnd, '\t'.join(elements[1:]))
sys.stdout.write(payload)

现在注释掉sys.stdout.write语句...

顺便说一句,有人在评论中提到将其分解为多个写作...您考虑过吗? elements [1:]中平均有多少个字节?在染色体上?

By the way, someone mentioned in a comment about breaking this into more than one write ... have you considered this? How many bytes on average in elements[1:] ? In chromosome?

===主题更改:令我担心的是,您将lastEnd初始化为""而不是零,并且没有人对此发表评论.无论如何,您都应该解决此问题,这样可以大大简化并添加其他人的建议:

=== change of topic: It worries me that you initialise lastEnd to "" rather than to zero, and that nobody has commented on it. Any way, you should fix this, which allows a rather drastic simplification plus adding in others' suggestions:

def parseJarchLine(chromosome, line):
    global pLength
    global lastEnd
    elements = line.split('\t', 1)
    if elements[0][0] == 'p':
        pLength = int(elements[0][1:])
        return
    start = lastEnd + int(elements[0])
    lastEnd = start + pLength
    sys.stdout.write("%s\t%ld\t%ld" % (chromosome, start, lastEnd))
    if elements[1:]:
        sys.stdout.write(elements[1])
    sys.stdout.write(\n)

现在,我同样担心两个全局变量lastEndpLength - parseJarchLine函数现在很小,可以折叠回其唯一调用方extractData的主体中,从而节省了两个全局变量,以及一个庞大的函数调用.您还可以通过将write = sys.stdout.write放在extractData的最前面并使用它来保存sys.stdout.write的大量查询.

Now I'm similarly worried about the two global variables lastEnd and pLength -- the parseJarchLine function is now so small that it can be folded back into the body of its sole caller, extractData, which saves two global variables, and a gazillion function calls. You could also save a gazillion lookups of sys.stdout.write by putting write = sys.stdout.write once up the front of extractData and using that instead.

顺便说一句,该脚本针对Python 2.5或更高版本进行了测试;您是否尝试过在2.5和2.6上进行性能分析?

BTW, the script tests for Python 2.5 or better; have you tried profiling on 2.5 and 2.6?

这篇关于此cProfile结果告诉我需要修复什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆