AWS Athena并发限制:提交的查询数与正在运行的查询数 [英] AWS Athena concurrency limits: Number of submitted queries VS number of running queries

查看:303
本文介绍了AWS Athena并发限制:提交的查询数与正在运行的查询数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 AWS雅典娜限制,您可以提交一次最多可查询20个相同类型的查询,但这是一个软限制,可以根据要求增加.我使用boto3与Athena进行交互,并且我的脚本提交了16个CTAS查询,每个查询大约需要2分钟才能完成.在一个AWS账户中,只有我在使用Athena服务.但是,当我通过控制台查看查询状态时,尽管所有查询都处于状态Running,但实际上只有少数查询(平均5个)正在执行.这是通常在雅典娜"历史记录标签中看到的内容:

According to AWS Athena limitations you can submit up to 20 queries of the same type at a time, but it is a soft limit and can be increased on request. I use boto3 to interact with Athena and my script submits 16 CTAS queries each of which takes about 2 minutes to finish. In a AWS account, it is only me who is using Athena service. However, when I look at the state of queries through console I see that only a few of queries (5 on average) are actually being executed despite all of them being in state Running. Here is what would normally see in Athena hisotry tab:

我了解到,向Athena提交查询后,它将通过基于整体服务负载和传入请求的数量分配资源来处理查询.但是我尝试在不同的日期和时间运行它们,仍然会同时执行约5个查询.

I understand that, after I submit queries to Athena, it processes the queries by assigning resources based on the overall service load and the amount of incoming requests. But I tried to run them at different days and hours, still would get about 5 queries being executed at the same time.

所以我的问题是这应该如何?如果是这样,那么大约有15个查询处于空闲状态并等待可用的广告位,那么最多可以提交20个查询的意义.

So my question is this how it supposed to be? If it is then what is the point of being able to submit up to 20 queries if roughly 15 of them would be idling and waiting for available slots.

在presto文档中偶然发现了HIVE CONNECTOR,其中包含 AWS Glue目录配置属性.在那里我们可以看到

Just stumbled across HIVE CONNECTOR in presto documentation, which has a section AWS Glue Catalog Configuration Properties. There we can see

hive.metastore.glue.max-connections:与Glue的最大并发连接数(默认为5).

hive.metastore.glue.max-connections: Max number of concurrent connections to Glue (defaults to 5).

这让我想知道这是否与我的问题有关.据我了解,Athena只是在EMR集群上运行的Presto,该集群已配置为使用AWS Glue数据目录作为元存储.

This got me wonder if it has something to do with my issue. As I understand, Athena is simply a Presto that runs on EMR cluster which is configured to use AWS Glue Data Catalog as the Metastore.

那么,如果我的问题来自于Athena的EMR集群仅使用默认值来与Glue进行并发连接,该怎么办呢?该值为5,这恰好是我所实际执行的并发查询数(平均)

So what if my issue comes from the fact that EMR cluster for Athena simply uses default value for concurrent connections to Glue, which is 5 which and is exactly of how many concurrent queries are actually getting executed (on average) in my case.

Athena团队最近为Athena部署了许多新功能.尽管QUEUED处于枚举状态已经有一段时间了,直到现在才使用它.因此,现在我可以在历史记录"选项卡中获得关于查询状态的正确信息,但是其他所有内容都保持不变.

The Athena team recently deployed a host of new functionality for Athena. although QUEUED has been in the state enum for some time is hasn't been used until now. So now I get, correct info about query state in a history tab, but everything else remains the same.

此外,另一篇文章也发布了类似的问题.

Also, another post was published with similar problem.

推荐答案

您对Athena服务的帐户限制不是SLA,而是查询调度程序中的优先级.

Your account's limits for the Athena service is not an SLA, it's more of a priority in the query scheduler.

根据可用容量,即使您没有运行任何其他查询,您的查询也可能会排队.更高的并发限制实际上是内部的,并且可能会改变,但是根据我的经验,最好将其视为他的查询调度程序处理您的查询的优先级.所有帐户的查询都在同一个服务器池中运行,并且如果每个人都在运行查询,将没有剩余容量供您使用.

Depending on available capacity your queries may be queued even though you're not running any other queries. Exactly what a higher concurrency limit means is internal and could change, but in my experience it's best to think of it as the priority by which he query scheduler will deal with your query. Queries for all accounts run in the same server pool(s) and if everyone is running queries there will not be any capacity left for you.

通过反复运行相同的查询,然后绘制查询执行指标随时间变化,您可以看到实际的效果,您会注意到它们之间的差异很大,并且您会注意到查询排队的时间尖峰每小时的最高时间–当其他所有人都在运行其计划的查询时.

You can see this in action by running the same query over and over again and then plot the query execution metrics over time, you will notice that they vary a lot, and you will notice spikes in the time your queries are queued on the top of every hour – when everyone else is running their scheduled queries.

这篇关于AWS Athena并发限制:提交的查询数与正在运行的查询数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆