在python中同时调用一个api [英] calling an api concurrently in python

查看:55
本文介绍了在python中同时调用一个api的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要与 api 交谈以获取有关团队的信息.每个团队都有一个唯一的ID.我用那个 id 调用 api,我得到了每个团队的球员列表(字典列表).播放器的一个键是另一个 ID,我可以用它来获取有关该播放器的更多信息.我可以捆绑所有这些 player_id 并调用 api 以在一个 api 调用中获取每个玩家的所有附加信息.

I need to talk to an api to get information about teams. Each team has a unique id. I call the api with that id, and I get a list of players on each team (list of dicts). One of the keys for a player is another id that I can use to get more information about that player. I can bundle all these player_ids and make a call to the api to get all the additional information for each player in one api call.

我的问题是:我预计团队的数量会增长,可能会非常大.此外,每支球队的球员人数也可能会增加.

My question is this: I expect the number of teams to grow, it could be quite large. Also, the number of players for each team could also grow large.

同时对 api 进行这些 api 调用的最佳方法是什么?我可以使用 multiprocessing.dummy 中的 ThreadPool,我也看到 genvent 用于类似的事情.

What is the best way to make these api calls concurrently to the api? I can use the ThreadPool from multiprocessing.dummy, I have also seen genvent used for something like this.

对 api 的调用需要一些时间才能获得返回值(每次批量 api 调用需要 1-2 秒).

The calls to the api take some time to get a return value (1-2 seconds for each bulk api call).

现在,我要做的是:

for each team:
    get the list of players
    store the player_ids in a list
    get the player information for all the players (passing the list of player_ids)
assemble and process the information

如果我使用线程池,我可以执行以下操作:

If I use ThreadPool, I can do the following:

create a ThreadPool of size x
result = pool.map(function_to_get_team_info, list of teams)
pool.close()
pool.join()
#process results

def function_to_get_team_info(team_id):
    players = api.call(team_id)
    player_info = get_players_information(players)
    return player_info

def get_players_information(players):
    player_ids = []
    for player in players:
        player_ids.append(player['id'])
    return get_all_player_stats(player_ids)

def get_all_player_stats(players_id):
    return api.call(players_id)

这会同时处理每个团队,并将所有信息组装回线程池结果中.

This processes each team concurrently, and assembles all the information back in the ThreadPool results.

为了使这完全并发,我想我需要使我的线程池大小与团队数量相同.但我不认为这可以很好地扩展.所以,我想知道我是否使用 gevent 来处理这些信息是否是一种更好的方法.

In order to make this completely concurrent, I think I would need to make my ThreadPool the size of the number of teams. But I don't think this scales well. So, I was wondering if I used gevent to process this information if that would be a better approach.

非常欢迎任何建议

推荐答案

一种解决方案是:

  • 准备要执行的任务列表,在您的情况下是要处理的团队 ID 列表,
  • 创建 N 个线程工作者的固定池,
  • 每个工作线程从列表中弹出一个任务并处理该任务(下载团队数据),完成后弹出另一个任务,
  • 当任务列表为空时,工作线程停止.

当处理特定团队时,此解决方案可以保护您免受例如100个时间单位,当其他团队以1个时间单位处理时(平均).

This solution could safe you from the case when processing of a particular team takes e.g. 100 time units, when other teams are processed in 1 time unit (on an average).

您可以根据团队数量、平均团队处理时间、CPU 内核数量等调整线程工作者的数量.

You can tune number of thread workers depending on number of teams, average team processing time, number of CPU cores etc.

扩展答案

这可以通过 Python multiprocessing.Pool:

This can be achieved with the Python multiprocessing.Pool:

from multiprocessing import Pool

def api_call(id):
    pass # call API for given id

if __name__ == '__main__':
    p = Pool(5)
    p.map(api_call, [1, 2, 3])

这篇关于在python中同时调用一个api的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆