在 Python 中追加到列表时出现内存错误 [英] Memory error when appending to list in Python
问题描述
我有一个包含 8000 个网站网址的列表.我想从网站上刮下文本并将所有内容保存为 csv 文件.为此,我想将每个文本页面保存在列表中.到目前为止,这是我的代码,它正在产生和MemoryError".
I have a list of 8000 website urls. I would like to scrape the text off of the websites and save everything as a csv file. To do this i wanted to save each text-page in a list. This is my code so far which is producing and "MemoryError".
import os
from splinter import *
import csv
import re
from inscriptis import get_text
from selenium.common.exceptions import WebDriverException
executable_path = {'executable_path' :'./phantomjs'}
browser = Browser('phantomjs', **executable_path)
links = []
with open('./Hair_Salons.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
for r in row:
links.append(r)
for l in links:
if 'yelp' in l:
links.remove(l)
df = []
for k in links:
temp = []
temp2 = []
browser.visit(k)
if len(browser.find_link_by_partial_text('About'))>0:
about = browser.find_link_by_partial_text('About')
print(about['href'])
try:
browser.visit(about['href'])
temp.append(get_text(browser.html)) # <----- This is where the error is occuring
except WebDriverException:
pass
else:
browser.visit(k)
temp.append(get_text(browser.html))
for s in temp:
ss = re.sub(r'[^\w]', ' ', s)
temp2.append(ss)
temp2 = ' '.join(temp2)
print(temp2.strip())
df.append(temp2.strip())
with open('Hair_Salons text', 'w') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(df)
如何避免出现内存错误?
How can i avoid getting a memory error?
推荐答案
如果您无法将所有数据保存在内存中,那就不要.在高层次上,您的代码具有这种结构
If you can't hold all your data in memory, then don't. At a high level, your code has this structure
for k in links:
temp = []
temp2 = []
browser.visit(k)
# do stuff that fills in temp
for s in temp:
ss = re.sub(r'[^\w]', ' ', s)
temp2.append(ss)
temp2 = ' '.join(temp2)
print(temp2.strip())
df.append(temp2.strip())
with open('Hair_Salons text', 'w') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(df)
因此,您将大量内容放入数据框中,然后将其写入 - 您不会在循环中使用它.而不是 df.append(temp2.strip())
写入那里的文件.让您在循环外打开文件一次(可能更明智)或打开以进行追加(使用 'a'
而不是 'w'
).
So, you put lots of stuff into a data frame, then write it - you don't use it in the loop. Instead of the df.append(temp2.strip())
write to the file there.
Make you you either open the file once, outside the loop (perhaps more sensible) or open for appending (using 'a'
instead of 'w'
).
这篇关于在 Python 中追加到列表时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!