在 Python 中追加到列表时出现内存错误 [英] Memory error when appending to list in Python

查看:51
本文介绍了在 Python 中追加到列表时出现内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 8000 个网站网址的列表.我想从网站上刮下文本并将所有内容保存为 csv 文件.为此,我想将每个文本页面保存在列表中.到目前为止,这是我的代码,它正在产生和MemoryError".

I have a list of 8000 website urls. I would like to scrape the text off of the websites and save everything as a csv file. To do this i wanted to save each text-page in a list. This is my code so far which is producing and "MemoryError".

import os
from splinter import *
import csv
import re
from inscriptis import get_text
from selenium.common.exceptions import WebDriverException


executable_path = {'executable_path' :'./phantomjs'}
browser = Browser('phantomjs', **executable_path)
links = []


with open('./Hair_Salons.csv') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        for r in row:
            links.append(r)

for l in links:
    if 'yelp' in l:
        links.remove(l)

df = []

for k in links:
    temp = []
    temp2 = []
    browser.visit(k)

    if len(browser.find_link_by_partial_text('About'))>0:
        about = browser.find_link_by_partial_text('About')
        print(about['href'])
        try:
            browser.visit(about['href'])
            temp.append(get_text(browser.html)) # <----- This is where the error is occuring
        except WebDriverException:
            pass
    else:
        browser.visit(k)
        temp.append(get_text(browser.html))
    for s in temp:
        ss = re.sub(r'[^\w]', ' ', s)
        temp2.append(ss)

    temp2 = ' '.join(temp2)
    print(temp2.strip())

    df.append(temp2.strip())

with open('Hair_Salons text', 'w') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(df)

如何避免出现内存错误?

How can i avoid getting a memory error?

推荐答案

如果您无法将所有数据保存在内存中,那就不要.在高层次上,您的代码具有这种结构

If you can't hold all your data in memory, then don't. At a high level, your code has this structure

for k in links:
    temp = []
    temp2 = []
    browser.visit(k)

    # do stuff that fills in temp

    for s in temp:
        ss = re.sub(r'[^\w]', ' ', s)
        temp2.append(ss)

    temp2 = ' '.join(temp2)
    print(temp2.strip())

    df.append(temp2.strip())

with open('Hair_Salons text', 'w') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(df)

因此,您将大量内容放入数据框中,然后将其写入 - 您不会在循环中使用它.而不是 df.append(temp2.​​strip()) 写入那里的文件.让您在循环外打开文件一次(可能更明智)或打开以进行追加(使用 'a' 而不是 'w').

So, you put lots of stuff into a data frame, then write it - you don't use it in the loop. Instead of the df.append(temp2.strip()) write to the file there. Make you you either open the file once, outside the loop (perhaps more sensible) or open for appending (using 'a' instead of 'w').

这篇关于在 Python 中追加到列表时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆