由于大量列表,Python For循环变慢 [英] Python For Loop Slows Due To Large List

查看:183
本文介绍了由于大量列表,Python For循环变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以目前我有一个for循环,这导致python程序死于该程序说杀死".它减慢了大约6000个项目的速度,该程序在大约6852个列表项中缓慢消失.我该如何解决?

So currently I have a for loop, which causes the python program to die with the program saying 'Killed'. It slows down around 6000 items in, with the program slowly dying at around 6852 list items. How do I fix this?

我认为是由于列表太大.

I assume it's due to the list being too large.

我曾尝试将列表分成6000个左右的两个部分.也许是由于内存管理等原因造成的.帮助将不胜感激.

I've tried splitting the list in two around 6000. Maybe it's due to memory management or something. Help would be appreciated.

    for id in listofids:
        connection = psycopg2.connect(user = "username", password = "password", host = "localhost", port = "5432", database = "darkwebscraper")

        cursor = connection.cursor()
        cursor.execute("select darkweb.site_id, darkweb.site_title, darkweb.sitetext from darkweb where darkweb.online='true' AND darkweb.site_id = %s", ([id]))
        print(len(listoftexts))

        try:
            row = cursor.fetchone()
        except:
            print("failed to fetch one")
        try:
            listoftexts.append(row[2])
            cursor.close()
            connection.close()
        except:
            print("failed to print")

推荐答案

您是正确的,这可能是因为列表变大了:python列表是内存中的连续空间.每次添加到列表中时,python都会检查下一个位置是否有斑点,如果没有,他会将整个数组重新放置在有足够空间的地方.数组越大,python的重定位位置就越多.

You're right, it's probably because the list becomes large: python list are contiguous spaces in memory. Each time you append to the list, python looks if there is a spot at the next position, and if not he relocates the whole array somewhere where there is enough room. The bigger your array, the more python has a to relocate.

一种解决方法是预先创建合适大小的数组.

One way around would be to create an array of the right size beforehand.

为确保清楚,我举了一个例子来说明我的观点.我做了2个功能.第一个在每次迭代时将字符串化索引(以使其更大)附加到列表中,另一个仅填充一个numpy数组:

Just to make sure it was clear, I made up an example to illustrate my point. I've made 2 functions. The first one appends the stringified index (to make it bigger) to a list at each iteration, and the other just fills a numpy array:

import numpy as np
import matplotlib.pyplot as plt
from time import time

def test_bigList(N):
    L = []
    times = np.zeros(N,dtype=np.float32)

    for i in range(N):
        t0 = time()
        L.append(str(i))
        times[i] = time()-t0

    return times

def test_bigList_numpy(N):
    L = np.empty(N,dtype="<U32")
    times = np.zeros(N,dtype=np.float32)

    for i in range(N):
        t0 = time()
        L[i] = str(i)
        times[i] = time()-t0
    return times

N = int(1e7)
res1 = test_bigList(N)
res2 = test_bigList_numpy(N)

plt.plot(res1,label="list")
plt.plot(res2,label="numpy array")
plt.xlabel("Iteration")
plt.ylabel("Running time")
plt.legend()
plt.title("Evolution of iteration time with the size of an array")
plt.show()

我得到以下结果:

在图上您可以看到,对于列表情况,您经常有一些峰值(可能是由于重定位),并且它们似乎随着列表的大小而增加.此示例带有短附加字符串,但是字符串越大,您看到的效果越多.

You can see on the figure that for the list case, you have regularly some peaks (probably due to relocation), and they seem to increase with the size of the list. This example is with short appended strings, but the bigger the string, the more you will see this effect.

如果它不能解决问题,那么它可能已链接到数据库本身,但是在不了解数据库细节的情况下我无法为您提供帮助.

If it does not do the trick, then it might be linked to the database itself, but I can't help you without knowing the specifics of the database.

这篇关于由于大量列表,Python For循环变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆