使用Python(IPython)并行进行多个API调用 [英] Making multiple API calls in parallel using Python (IPython)

查看:239
本文介绍了使用Python(IPython)并行进行多个API调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在本地计算机(Mac)上使用Python(IPython和Canopy)和RESTful内容API.

I am working with Python (IPython & Canopy) and a RESTful content API, on my local machine (Mac).

我有3000个唯一ID的数组,可以从API中提取数据,并且一次只能调用一个ID.

I have an array of 3000 unique IDs to pull data for from the API and can only call the API with one ID at a time.

我希望以某种方式并行进行3组1000个呼叫,以加快处理速度.

I was hoping somehow to make 3 sets of 1000 calls in parallel to speed things up.

做到这一点的最佳方法是什么?

What is the best way of doing this?

在此先感谢您的帮助!

推荐答案

没有关于您正在做的事情的详细信息,很难确定,但是简单的线程化方法可能是有道理的.

Without more information about what you are doing in particular, it is hard to say for sure, but a simple threaded approach may make sense.

假设您有一个处理单个ID的简单函数:

Assuming you have a simple function that processes a single ID:

import requests

url_t = "http://localhost:8000/records/%i"

def process_id(id):
    """process a single ID"""
    # fetch the data
    r = requests.get(url_t % id)
    # parse the JSON reply
    data = r.json()
    # and update some data with PUT
    requests.put(url_t % id, data=data)
    return data

您可以将其扩展为处理一系列ID的简单函数:

You can expand that into a simple function that processes a range of IDs:

def process_range(id_range, store=None):
    """process a number of ids, storing the results in a dict"""
    if store is None:
        store = {}
    for id in id_range:
        store[id] = process_id(id)
    return store

最后,您可以相当轻松地将子范围映射到线程上,以允许并发一定数量的请求:

and finally, you can fairly easily map sub-ranges onto threads to allow some number of requests to be concurrent:

from threading import Thread

def threaded_process_range(nthreads, id_range):
    """process the id range in a specified number of threads"""
    store = {}
    threads = []
    # create the threads
    for i in range(nthreads):
        ids = id_range[i::nthreads]
        t = Thread(target=process_range, args=(ids,store))
        threads.append(t)

    # start the threads
    [ t.start() for t in threads ]
    # wait for the threads to finish
    [ t.join() for t in threads ]
    return store

IPython Notebook中的完整示例: http://nbviewer.ipython.org/5732094

A full example in an IPython Notebook: http://nbviewer.ipython.org/5732094

如果您的单个任务花费的时间差异更大,则可能需要使用

If your individual tasks take a more widely varied amount of time, you may want to use a ThreadPool, which will assign jobs one at a time (often slower if individual tasks are very small, but guarantees better balance in heterogenous cases).

这篇关于使用Python(IPython)并行进行多个API调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆