AWS Athena 并发限制:提交的查询数 VS 正在运行的查询数 [英] AWS Athena concurrency limits: Number of submitted queries VS number of running queries

查看:22
本文介绍了AWS Athena 并发限制:提交的查询数 VS 正在运行的查询数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 .在那里我们可以看到

<块引用>

hive.metastore.glue.max-connections:到 Glue 的最大并发连接数(默认为 5).

这让我想知道它是否与我的问题有关.据我了解,Athena 只是一个在 EMR 集群上运行的 Presto,该集群配置为使用 AWS Glue 数据目录作为 Metastore.

那么,如果我的问题来自这样一个事实,即 Athena 的 EMR 集群只是使用默认值来连接到 Glue 的并发连接,它是 5,这正是在我的情况下实际执行(平均)并发查询的数量.

更新 2019-11-27

Athena 团队最近为 Athena 部署了许多新功能.虽然 QUEUED 已经处于 enum 状态一段时间了,但直到现在还没有被使用.所以现在我在历史选项卡中得到了关于查询状态的正确信息,但其他一切都保持不变.

此外,另一篇帖子也有类似问题.

解决方案

您的帐户对 Athena 服务的限制不是 SLA,它在查询调度程序中更重要.

根据可用容量,即使您没有运行任何其他查询,您的查询也可能会排队.更高并发限制的确切含义是内部的并且可能会改变,但根据我的经验,最好将其视为查询调度程序处理查询的优先级.所有帐户的查询都在同一个服务器池中运行,如果每个人都在运行查询,那么您将没有任何容量可用.

您可以通过一遍又一遍地运行相同的查询来看到这一点,然后绘制查询执行指标随时间变化的图,您会注意到它们变化很大,并且您会注意到查询排队的时间出现峰值每小时的顶部 - 当其他人都在运行他们预定的查询时.

According to AWS Athena limitations you can submit up to 20 queries of the same type at a time, but it is a soft limit and can be increased on request. I use boto3 to interact with Athena and my script submits 16 CTAS queries each of which takes about 2 minutes to finish. In a AWS account, it is only me who is using Athena service. However, when I look at the state of queries through console I see that only a few of queries (5 on average) are actually being executed despite all of them being in state Running. Here is what would normally see in Athena hisotry tab:

I understand that, after I submit queries to Athena, it processes the queries by assigning resources based on the overall service load and the amount of incoming requests. But I tried to run them at different days and hours, still would get about 5 queries being executed at the same time.

So my question is this how it supposed to be? If it is then what is the point of being able to submit up to 20 queries if roughly 15 of them would be idling and waiting for available slots.

Update 2019-09-26

Just stumbled across HIVE CONNECTOR in presto documentation, which has a section AWS Glue Catalog Configuration Properties. There we can see

hive.metastore.glue.max-connections: Max number of concurrent connections to Glue (defaults to 5).

This got me wonder if it has something to do with my issue. As I understand, Athena is simply a Presto that runs on EMR cluster which is configured to use AWS Glue Data Catalog as the Metastore.

So what if my issue comes from the fact that EMR cluster for Athena simply uses default value for concurrent connections to Glue, which is 5 which and is exactly of how many concurrent queries are actually getting executed (on average) in my case.

Update 2019-11-27

The Athena team recently deployed a host of new functionality for Athena. although QUEUED has been in the state enum for some time is hasn't been used until now. So now I get, correct info about query state in a history tab, but everything else remains the same.

Also, another post was published with similar problem.

解决方案

Your account's limits for the Athena service is not an SLA, it's more of a priority in the query scheduler.

Depending on available capacity your queries may be queued even though you're not running any other queries. Exactly what a higher concurrency limit means is internal and could change, but in my experience it's best to think of it as the priority by which he query scheduler will deal with your query. Queries for all accounts run in the same server pool(s) and if everyone is running queries there will not be any capacity left for you.

You can see this in action by running the same query over and over again and then plot the query execution metrics over time, you will notice that they vary a lot, and you will notice spikes in the time your queries are queued on the top of every hour – when everyone else is running their scheduled queries.

这篇关于AWS Athena 并发限制:提交的查询数 VS 正在运行的查询数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆