Python-在本地保存请求或BeautifulSoup对象 [英] Python - save requests or BeautifulSoup object locally

查看:137
本文介绍了Python-在本地保存请求或BeautifulSoup对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些很长的代码,因此需要很长时间才能运行.我只想在本地保存请求对象(在这种情况下为"name")或BeautifulSoup对象(在这种情况下为"soup"),以便下次可以节省时间.这是代码:

I have some code that is quite long, so it takes a long time to run. I want to simply save either the requests object (in this case "name") or the BeautifulSoup object (in this case "soup") locally so that next time I can save time. Here is the code:

from bs4 import BeautifulSoup
import requests

url = 'SOMEURL'
name = requests.get(url)
soup = BeautifulSoup(name.content)

推荐答案

由于name.content只是HTML,因此您可以将其转储到文件中并稍后再读回.

Since name.content is just HTML, you can just dump this to a file and read it back later.

通常,瓶颈不是解析,而是发出请求的网络延迟.

Usually the bottleneck is not the parsing, but instead the network latency of making requests.

from bs4 import BeautifulSoup
import requests

url = 'https://google.com'
name = requests.get(url)

with open("/tmp/A.html", "w") as f:
  f.write(name.content)


# read it back in
with open("/tmp/A.html") as f:
  soup = BeautifulSoup(f)
  # do something with soup

这里有一些轶事证据表明网络中存在瓶颈.

Here is some anecdotal evidence for the fact that bottleneck is in the network.

from bs4 import BeautifulSoup
import requests
import time

url = 'https://google.com'

t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();

print t2 - t1, t3 - t2

通过在Thinkpad X1 Carbon上运行并具有快速园区网络的输出.

Output, from running on Thinkpad X1 Carbon, with a fast campus network.

0.11 0.02

这篇关于Python-在本地保存请求或BeautifulSoup对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆