Python-在本地保存请求或BeautifulSoup对象 [英] Python - save requests or BeautifulSoup object locally
问题描述
我有一些很长的代码,因此需要很长时间才能运行.我只想在本地保存请求对象(在这种情况下为"name")或BeautifulSoup对象(在这种情况下为"soup"),以便下次可以节省时间.这是代码:
I have some code that is quite long, so it takes a long time to run. I want to simply save either the requests object (in this case "name") or the BeautifulSoup object (in this case "soup") locally so that next time I can save time. Here is the code:
from bs4 import BeautifulSoup
import requests
url = 'SOMEURL'
name = requests.get(url)
soup = BeautifulSoup(name.content)
推荐答案
由于name.content
只是HTML
,因此您可以将其转储到文件中并稍后再读回.
Since name.content
is just HTML
, you can just dump this to a file and read it back later.
通常,瓶颈不是解析,而是发出请求的网络延迟.
Usually the bottleneck is not the parsing, but instead the network latency of making requests.
from bs4 import BeautifulSoup
import requests
url = 'https://google.com'
name = requests.get(url)
with open("/tmp/A.html", "w") as f:
f.write(name.content)
# read it back in
with open("/tmp/A.html") as f:
soup = BeautifulSoup(f)
# do something with soup
这里有一些轶事证据表明网络中存在瓶颈.
Here is some anecdotal evidence for the fact that bottleneck is in the network.
from bs4 import BeautifulSoup
import requests
import time
url = 'https://google.com'
t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();
print t2 - t1, t3 - t2
通过在Thinkpad X1 Carbon上运行并具有快速园区网络的输出.
Output, from running on Thinkpad X1 Carbon, with a fast campus network.
0.11 0.02
这篇关于Python-在本地保存请求或BeautifulSoup对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!