我的 Python 程序很慢!我怎样才能加快速度?难道我做错了什么? [英] My Python program is very slow! How can I speed it up? Am I doing something wrong?

查看:54
本文介绍了我的 Python 程序很慢!我怎样才能加快速度?难道我做错了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行了 python 分析器和两个最耗时的事情(这是在我决定注释掉代码的 webbrowser 部分和 Firefox 部分之后,因为我知道它们将是最慢的部分...) ,我的程序中最慢的部分是 re.findallre.compile 以及 (len) 和 (append to list).

I ran the python profiler and the two most time-consuming things (this is after I decided to comment out the webbrowser portion and Firefox portion of the code, because I knew they were going to be the slowest part...) , the slowest part of my program is re.findall and re.compile and also (len) and (append to list).

我不知道我是否应该立即将所有代码发布到这里,因为我在我的程序上非常努力(即使它不太好),所以现在我只是想问一下...如何让我的 Python 程序更快?

I don't know if I should post all of my code on here at once because I worked really hard on my program (even if it isn't too good), so for now I'm just going to ask...How do I make my Python program faster?

我现在有 3 个嫌疑人,因为它太慢了:

I have 3 suspects right now for it being so slow:

  1. 也许我的电脑很慢

  1. Maybe my computer is just slow

也许我的互联网太慢(有时我的程序必须下载网页的 html,然后它会在 html 中搜索特定的文本)

Maybe my internet is too slow (sometimes my program has to download the html of web pages and then it searches through the html for a specific piece of text)

我的代码很慢(可能是循环太多?别的什么?我是新手,所以我不知道!)

My code is slow (too many loops maybe? something else? I'm new to this so I wouldn't know!)

如果有人能给我建议,我将不胜感激!

If anyone could offer me advice, I would greatly appreciate it!

谢谢!

我认为我的代码使用了很多循环......此外,另一件事是,要使程序正常工作,您必须登录到此网站:http://www.locationary.com/

My code uses lots of loops I think...also, another thing is that for the program to work you have to be logged in to this website: http://www.locationary.com/

from urllib import urlopen
from gzip import GzipFile
from cStringIO import StringIO
import re
import urllib
import urllib2
import webbrowser
import time
from difflib import SequenceMatcher
import os

def download(url):
    s = urlopen(url).read()
    if s[:2] == '\x1f\x8b': # assume it's gzipped data
        with GzipFile(mode='rb', fileobj=StringIO(s)) as ifh:
            s = ifh.read()
    return s

for t in range(3,39):
    print t
    s = download('http://www.locationary.com/place/en/US/Utah/Provo-page' + str(t) + '/?ACTION_TOKEN=NumericAction')
    findLoc = re.compile('http://www\.locationary\.com/place/en/US/.{1,50}/.{1,50}/.{1,100}\.jsp')
    findLocL = re.findall(findLoc,s)
    W = []
    X = []
    XA = []
    Y = []
    YA = []
    Z = []
    ZA = []

    for i in range(0,25):
        b = download(findLocL[i])        
        findYP = re.compile('http://www\.yellowpages\.com/')
        findYPL = re.findall(findYP,b)
        findTitle = re.compile('<title>(.*) \(\d{1,10}.{1,100}\)</title>')
        getTitle = re.findall(findTitle,b)        
        findAddress = re.compile('<title>.{1,100}\((.*), .{4,14}, United States\)</title>')
        getAddress = re.findall(findAddress,b)        
        if not findYPL:
            if not getTitle:
                print ""
            else:
                W.append(findLocL[i])
            b = download(findLocL[i])
            if not getTitle:
                print ""
            else:
                X.append(getAddress)
            b = download(findLocL[i])
            if not getTitle:
                print ""
            else:
                Y.append(getTitle)
    sizeWXY = len(W)

    def XReplace(text, dic):
        for i, j in dic.iteritems():
            text = text.replace(i, j)  
        XA.append(text)

    def YReplace(text2, dic2):
        for k, l in dic2.iteritems():
            text2 = text2.replace(k, l)  
        YA.append(text2)

    for d in range(0,sizeWXY):
        old = str(X[d])
        reps = {' ':'-', ',':'', '\'':'', '[':'', ']':''}
        XReplace(old, reps)
        old2 = str(Y[d])
        YReplace(old2, reps)

    count = 0    
    for e in range(0,sizeWXY):
        newYPL = "http://www.yellowpages.com/" + XA[e] + "/" + YA[e] + "?order=distance"
        v = download(newYPL)
        abc = str('<h3 class="business-name fn org">\n<a href="')
        dfe = str('" class="no-tracks url "')
        findFinal = re.compile(abc + '(.*)' + dfe)
        getFinal = re.findall(findFinal, v)
        if not getFinal:
            W.remove(W[(e-count)])
            X.remove(X[(e-count)])
            count = (count+1)
        else:
            for f in range(0,1):
                Z.append(getFinal[f])
    XA = []
    for c in range(0,(len(X))):
        aGd = re.compile('(.*), .{1,50}')
        bGd = re.findall(aGd, str(X[c]))
        XA.append(bGd)
    LenZ = len(Z)
    V = []
    for i in range(0,(len(W))):
        if i == 0:
            countTwo = 0
        gda = download(Z[i-(countTwo)])
        ab = str('"street-address">\n')
        cd = str('\n</span>')
        ZAddress = re.compile(ab + '(.*)' + cd)
        ZAddress2 = re.findall(ZAddress, gda)
        for b in range(0,(len(ZAddress2))):
            if not ZAddress2[b]:
                print ""
            else:
                V.append(str(ZAddress2[b]))
                a = str(W[i-(countTwo)])
                n = str(Z[i-(countTwo)])
                c = str(XA[i])
                d = str(V[i])
                #webbrowser.open(a)
                #webbrowser.open(n)
                m = SequenceMatcher(None, c, d)
                if m.ratio() < 0.50:
                    Z.remove(Z[i-(countTwo)])
                    W.remove(W[i-(countTwo)])
                    countTwo = (countTwo+1)

    def ZReplace(text3, dic3):
        for p, q in dic3.iteritems():
            text3 = text3.replace(p, q)  
        ZA.append(text3)

    for y in range(0,len(Z)):
        old3 = str(Z[y])
        reps2 = {':':'%3A', '/':'%2F', '?':'%3F', '=':'%3D'}
        ZReplace(old3, reps2)
    for z in range(0,len(ZA)):
        findPID = re.compile('\d{5,20}')
        getPID = re.findall(findPID,str(W[z]))
        newPID = re.sub("\D", "", str(getPID))
        finalURL = "http://www.locationary.com/access/proxy.jsp?ACTION_TOKEN=proxy_jsp$JspView$SaveAction&inPlaceID=" + str(newPID) + "&xxx_c_1_f_987=" + str(ZA[z])
        webbrowser.open(finalURL)
        time.sleep(5)

    os.system("taskkill /F /IM firefox.exe")

推荐答案

当程序运行缓慢时,首先要做的是找出瓶颈;事实上,您想要优化需要很长时间的东西,而不是实际上可能很快的东西.在 Python 中,最有效的方法是使用 Python 分析器之一,这是用于性能分析的专用工具.这是一个快速入门:

The first thing to do when a program is slow is to identify bottlenecks; in fact, you want to optimize things that take a long time, not things that may actually be fast. In Python, the most efficient way to do this is with one of the Python profilers, which are dedicated tools for performance analysis. Here is a quickstart:

python -m cProfile -o prof.dat <prog> <args>

运行您的程序并将分析信息存储在 prof.dat 中.那么,

runs your program and stores profiling information in prof.dat. Then,

python -m pstats prof.dat

运行剖析信息分析工具 pstats.重要的 pstat 命令包括:​​

runs the profiling information analysis tool pstats. Important pstat commands include:

sort time

按在函数中花费的时间对函数进行排序,并且您可以使用不同的键代替 time(cumulative,...).另一个重要的命令是

which sorts functions by the time spent in them, and which you can use with a different key instead of time (cumulative,…). Another important command is

stats

which 打印统计信息(或 stats 10 打印前 10 个最耗时的函数).您可以通过 ?help 获得帮助.

which print statistics (or stats 10 to print the first 10 most time-consuming functions). You can obtain help with ?, or help <command>.

优化程序的方法在于处理导致瓶颈的特定代码.您可以发布计时结果,并可能会在程序中最有用的优化部分获得更具体的帮助.

The way to optimize your program then consists in dealing with the particular code that causes the bottlenecks. You can post the timing results and maybe get some more specific help on the sections of the program that could be most usefully optimized.

这篇关于我的 Python 程序很慢!我怎样才能加快速度?难道我做错了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆