多线程抓取雅虎财经 [英] Multithreading to Scrape Yahoo Finance

查看：57 发布时间：2021/6/4 20:13:55 python multithreading yahoo-finance

本文介绍了多线程抓取雅虎财经的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在运行一个程序来从 Yahoo! 中提取一些信息.金融.它作为 For 循环运行良好，但是它需要很长时间(7,000 个输入大约需要 10 分钟)，因为它必须单独处理每个 request.get(url)(还是我对主要瓶颈的理解有误?)

I'm running a program to pull some info from Yahoo! Finance. It runs fine as a For loop, however it takes a long time (about 10 minutes for 7,000 inputs) because it has to process each request.get(url) individually (or am I mistaken on the major bottlenecker?)

无论如何，我遇到了多线程作为一个潜在的解决方案.这是我尝试过的:

Anyway, I came across multithreading as a potential solution. This is what I have tried:

import requests
import pprint
import threading

with open('MFTop30MinusAFew.txt', 'r') as ins: #input file for tickers
    for line in ins:
        ticker_array = ins.read().splitlines()

ticker = ticker_array
url_array = []
url_data = []
data_array =[]

for i in ticker:
    url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'+i+'?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com'
    url_array.append(url) #loading each complete url at one time 

def fetch_data(url):
    urlHandler = requests.get(url)
    data = urlHandler.json()
    data_array.append(data)

pprint.pprint(data_array)

threads = [threading.Thread(target=fetch_data, args=(url,)) for url in url_array]

for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

fetch_data(url_array)

我得到的错误是 InvalidSchema: No connection adapters were found for '['https://query2.finance.... [url continue].

附注.我还读到使用多线程方法来抓取网站很糟糕/可能会让你被阻止.将雅虎！如果我一次从几千个股票中提取数据，财务会介意吗?当我按顺序执行它们时什么也没发生.

PS. I've also read that using multithread approach to scrape websites is bad/can get you blocked. Would Yahoo! Finance mind if I'm pulling data from a couple thousand tickers at once? Nothing happened when I did them sequentially.

多线程抓取雅虎财经 [英] Multithreading to Scrape Yahoo Finance

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

多线程抓取雅虎财经 [英] Multithreading to Scrape Yahoo Finance

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭