python列表中的内存泄漏问题 [英] Memory leakage issue in python list

查看:124
本文介绍了python列表中的内存泄漏问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

身份列表包含大约 57000张图像的大数组.现在,我在itertools.product()的帮助下创建一个否定列表.这会将整个列表存储在内存中,这非常昂贵,并且我的系统在4分钟后挂起.

The identities list contains a big array of approximately 57000 images. Now, I am creating a negative list with the help of itertools.product(). This stores the whole list in memory which is very costly and my system hanged after 4 minutes.

如何优化以下代码并避免节省内存?`

How can I optimize the below code and avoid saving in memory?`

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        cross_product = itertools.product(samples_list[i], samples_list[j])
        cross_product = list(cross_product)

        for cross_sample in cross_product:
            negative = []
            negative.append(cross_sample[0])
            negative.append(cross_sample[1])
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

negatives = negatives.sample(positives.shape[0])

内存9.30将会越来越高,并且在某一点上,系统已完全挂起.

The memory 9.30 is going to be higher and higher and on one point the system has been completely hanged.

我还实现了以下答案,并根据他的答案修改了代码.

I also implemented the below answer and modified code according to his answer.

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            negative = [cross_sample[0], cross_sample[1]]
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

第三版代码

即使您打开一个文件,该CSV文件也太大,它发出警报,通知您程序无法加载所有文件.关于此过程,需要十分钟,然后再次将系统完全挂起.

This CSV file is too big even if you open a file then it gives an alert that your program can not load all files. Regarding the process, it ten minutes, and then again the system going to be hanged completely.

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            with open('/home/khawar/deepface/tests/results.csv', 'a+') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow([cross_sample[0], cross_sample[1]])
            negative = [cross_sample[0], cross_sample[1]]
            negatives.append(negative)

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

negatives = negatives.sample(positives.shape[0])

内存屏幕截图.

推荐答案

实际上,生成的配对存储在您的内存中,这就是为什么您的内存越来越高的原因.

Actually, the generated pairs are saved in your memory and that's why your memory going to be higher and higher.

您必须更改将在其中生成对的代码,然后立即将其从内存中释放.

You have to change the code in which you will generate pairs and immediately release them from memory.

上一个代码:

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        cross_product = itertools.product(samples_list[i], samples_list[j])
        cross_product = list(cross_product)

        for cross_sample in cross_product:
            negative = []
            negative.append(cross_sample[0])
            negative.append(cross_sample[1])
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

内存有效代码,将对保存到列表中,然后第二次无需再次生成它.

Memory Efficient Code Save pairs in the list then second time no need to generate it again.

samples_list = list(identities.values())
negatives = pd.DataFrame()

    if Path("positives_negatives.csv").exists():
        df = pd.read_csv("positives_negatives.csv")
    else:
        for combo in tqdm(itertools.combinations(identities.values(), 2), desc="Negatives"):
            for cross_sample in itertools.product(combo[0], combo[1]):
                negatives = negatives.append(pd.Series({"file_x": cross_sample[0], "file_y": cross_sample[1]}).T,
                                             ignore_index=True)
        negatives["decision"] = "No"
        negatives = negatives.sample(positives.shape[0])
        df = pd.concat([positives, negatives]).reset_index(drop=True)
        df.to_csv("positives_negatives.csv", index=False)

这篇关于python列表中的内存泄漏问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆