Python请求需要很长时间 [英] Python Requests taking a long time

查看:31
本文介绍了Python请求需要很长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上我正在处理一个 python 项目,我从 sec edgar 数据库下载和索引文件.然而,问题是使用请求模块时,将文本保存在变量中需要很长时间(一个文件在 130 到 170 秒之间).

Basically I am working on a python project where I download and index files from the sec edgar database. The problem however, is that when using the requests module, it take a very long time to save the text in a variable (between ~130 and 170 seconds for one file).

该文件大约有 1600 万个字符,我想看看是否有任何方法可以轻松降低检索文本所需的时间.-- 示例:

The file roughly has around 16 million characters, and I wanted to see if there was any way to easily lower the time it takes to retrieve the text. -- Example:

import requests

url ="https://www.sec.gov/Archives/edgar/data/0001652044/000165204417000008/goog10-kq42016.htm"

r = requests.get(url, stream=True)

print(r.text)

谢谢!

推荐答案

我在 r.text 的代码中发现了什么,特别是当没有给出编码时 ( r.encoding == 'None').检测编码所花费的时间为 20 秒,我可以通过定义编码来跳过它.

What I found is in the code for r.text, specifically when no encoding was given ( r.encoding == 'None' ). The time spend detecting the encoding was 20 seconds, I was able to skip it by defining the encoding.

...
r.encoding = 'utf-8' 
...

其他详细信息

就我而言,我的请求没有返回编码类型.响应大小为 256k,r.apparent_encoding 需要 20 秒.

查看 text 属性函数.它会测试是否存在 编码.如果有 None,它会调用 apperent_encoding 函数将扫描文本以自动检测编码方案.

Looking into the text property function. It tests to see if there is an encoding. If there is None, it will call the apperent_encoding function which will scan the text to autodetect the encoding scheme.

对于长字符串,这需要时间.通过定义响应的编码(如上所述),您将跳过检测.

On a long string this will take time. By defining the encoding of the response ( as described above), you will skip the detection.

在你上面的例子中:

from datetime import datetime    
import requests

url = "https://www.sec.gov/Archives/edgar/data/0001652044/000165204417000008/goog10-kq42016.htm"

r = requests.get(url, stream=True)

print(r.encoding)

print(datetime.now())
enc = r.apparent_encoding
print(enc)

print(datetime.now())
print(r.text)
print(datetime.now())

r.encoding = enc
print(r.text)
print(datetime.now())

当然,输出可能会在打印中丢失,所以我建议您在交互式 shell 中运行上述内容,即使不打印 datetime.now()datetime.now()代码>

of course the output may get lost in the printing, so I recommend you run the above in an interactive shell, it may become more aparent where you are losing the time even without printing datetime.now()

这篇关于Python请求需要很长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆