在python调用API同时 [英] calling an api concurrently in python

查看:189
本文介绍了在python调用API同时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要跟一个API来获取有关信息的球队。每个团队都有一个唯一的ID。我打电话与该ID的API,我也得到玩家对各队列表(类型的字典列表)。一个用于播放器的按键是另一个ID,我可以用它来获取有关该玩家的详细信息。我可以捆绑这些player_ids并向API的调用来获取所有在一个API调用每一个玩家的更多信息。

I need to talk to an api to get information about teams. Each team has a unique id. I call the api with that id, and I get a list of players on each team (list of dicts). One of the keys for a player is another id that I can use to get more information about that player. I can bundle all these player_ids and make a call to the api to get all the additional information for each player in one api call.

我的问题是这样的:
我希望球队成长的数量,也可能是相当大的。此外,玩家每队人数也可能会变大。

My question is this: I expect the number of teams to grow, it could be quite large. Also, the number of players for each team could also grow large.

什么是使这些API调用同时对API的最好方法?我可以使用从multiprocessing.dummy线程池,我也看到genvent用于这样的事情。

What is the best way to make these api calls concurrently to the api? I can use the ThreadPool from multiprocessing.dummy, I have also seen genvent used for something like this.

对API的调用需要一段时间才能得到一个返回值(1-2秒每个散装API调用)。

The calls to the api take some time to get a return value (1-2 seconds for each bulk api call).

现在,我做的是这样的:

Right now, what I do is this:

for each team:
    get the list of players
    store the player_ids in a list
    get the player information for all the players (passing the list of player_ids)
assemble and process the information

如果我使用线程池,我可以做到以下几点:

If I use ThreadPool, I can do the following:

create a ThreadPool of size x
result = pool.map(function_to_get_team_info, list of teams)
pool.close()
pool.join()
#process results

def function_to_get_team_info(team_id):
    players = api.call(team_id)
    player_info = get_players_information(players)
    return player_info

def get_players_information(players):
    player_ids = []
    for player in players:
        player_ids.append(player['id'])
    return get_all_player_stats(player_ids)

def get_all_player_stats(players_id):
    return api.call(players_id)

这同时处理各队,并在线程池结果返回装配的所有信息。

This processes each team concurrently, and assembles all the information back in the ThreadPool results.

为了使这个完全并发,我想我需要让我的线程池队伍数量的大小。但我不认为这很好地进行扩展。所以,我在想,如果我用GEVENT处理此信息,如果这将是一个更好的办法。

In order to make this completely concurrent, I think I would need to make my ThreadPool the size of the number of teams. But I don't think this scales well. So, I was wondering if I used gevent to process this information if that would be a better approach.

任何建议将是非常欢迎的。

Any suggestions would be very welcome

推荐答案

一个解决办法是:


  • prepare要执行的任务列表,在团队中的ID您的案件清单进行处理,

  • 创建n主题工人固定池,

  • 每个工作线程从列表弹出一个任务和处理任务(下载数据队),完成后,它会弹出一个任务,

  • 当任务列表为空,工作线程停止。

该解决方案可以安全的从你的情况下,当一个特定的团队处理需要例如100个时间单位,当其他球队在1时间单位进行处理(在平均)。

This solution could safe you from the case when processing of a particular team takes e.g. 100 time units, when other teams are processed in 1 time unit (on an average).

您可以根据团队人数,团队的平均处理时间,线程的工人调多个CPU内核的数量等。

You can tune number of thread workers depending on number of teams, average team processing time, number of CPU cores etc.

扩展答案

这可以使用Python multiprocessing.Pool

This can be achieved with the Python multiprocessing.Pool:

from multiprocessing import Pool

def api_call(id):
    pass # call API for given id

if __name__ == '__main__':
    p = Pool(5)
    p.map(api_call, [1, 2, 3])

这篇关于在python调用API同时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆