使用 Python (IPython) 并行调用多个 API [英] Making multiple API calls in parallel using Python (IPython)

查看:27
本文介绍了使用 Python (IPython) 并行调用多个 API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在本地计算机 (Mac) 上使用 Python(IPython 和 Canopy)和 RESTful 内容 API.

I am working with Python (IPython & Canopy) and a RESTful content API, on my local machine (Mac).

我有一个包含 3000 个唯一 ID 的数组,可以从 API 中提取数据,并且一次只能使用一个 ID 调用 API.

I have an array of 3000 unique IDs to pull data for from the API and can only call the API with one ID at a time.

我希望以某种方式并行进行 3 组 1000 次调用以加快速度.

I was hoping somehow to make 3 sets of 1000 calls in parallel to speed things up.

这样做的最佳方法是什么?

What is the best way of doing this?

在此先感谢您的帮助!

推荐答案

如果没有更多关于你正在做什么的信息,很难确定,但一个简单的线程方法可能是有意义的.

Without more information about what you are doing in particular, it is hard to say for sure, but a simple threaded approach may make sense.

假设您有一个处理单个 ID 的简单函数:

Assuming you have a simple function that processes a single ID:

import requests

url_t = "http://localhost:8000/records/%i"

def process_id(id):
    """process a single ID"""
    # fetch the data
    r = requests.get(url_t % id)
    # parse the JSON reply
    data = r.json()
    # and update some data with PUT
    requests.put(url_t % id, data=data)
    return data

您可以将其扩展为一个处理一系列 ID 的简单函数:

You can expand that into a simple function that processes a range of IDs:

def process_range(id_range, store=None):
    """process a number of ids, storing the results in a dict"""
    if store is None:
        store = {}
    for id in id_range:
        store[id] = process_id(id)
    return store

最后,您可以相当轻松地将子范围映射到线程上,以允许并发某些数量的请求:

and finally, you can fairly easily map sub-ranges onto threads to allow some number of requests to be concurrent:

from threading import Thread

def threaded_process_range(nthreads, id_range):
    """process the id range in a specified number of threads"""
    store = {}
    threads = []
    # create the threads
    for i in range(nthreads):
        ids = id_range[i::nthreads]
        t = Thread(target=process_range, args=(ids,store))
        threads.append(t)

    # start the threads
    [ t.start() for t in threads ]
    # wait for the threads to finish
    [ t.join() for t in threads ]
    return store

IPython Notebook 中的完整示例:http://nbviewer.ipython.org/5732094

A full example in an IPython Notebook: http://nbviewer.ipython.org/5732094

如果您的单个任务花费的时间差异更大,您可能需要使用 ThreadPool,它将一次分配一个作业(如果单个任务非常小,通常会更慢,但在异构情况下保证更好的平衡).

If your individual tasks take a more widely varied amount of time, you may want to use a ThreadPool, which will assign jobs one at a time (often slower if individual tasks are very small, but guarantees better balance in heterogenous cases).

这篇关于使用 Python (IPython) 并行调用多个 API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆