AWS Athena并发限制:提交的查询数与正在运行的查询数 [英] AWS Athena concurrency limits: Number of submitted queries VS number of running queries

查看：303 发布时间：2020/4/30 11:32:54 concurrency limit amazon-emr amazon-athena aws-glue

本文介绍了AWS Athena并发限制:提交的查询数与正在运行的查询数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

根据 AWS雅典娜限制，您可以提交一次最多可查询20个相同类型的查询，但这是一个软限制，可以根据要求增加.我使用boto3与Athena进行交互，并且我的脚本提交了16个CTAS查询，每个查询大约需要2分钟才能完成.在一个AWS账户中，只有我在使用Athena服务.但是，当我通过控制台查看查询状态时，尽管所有查询都处于状态Running，但实际上只有少数查询(平均5个)正在执行.这是通常在雅典娜"历史记录标签中看到的内容:

According to AWS Athena limitations you can submit up to 20 queries of the same type at a time, but it is a soft limit and can be increased on request. I use boto3 to interact with Athena and my script submits 16 CTAS queries each of which takes about 2 minutes to finish. In a AWS account, it is only me who is using Athena service. However, when I look at the state of queries through console I see that only a few of queries (5 on average) are actually being executed despite all of them being in state Running. Here is what would normally see in Athena hisotry tab:

我了解到，向Athena提交查询后，它将通过基于整体服务负载和传入请求的数量分配资源来处理查询.但是我尝试在不同的日期和时间运行它们，仍然会同时执行约5个查询.

I understand that, after I submit queries to Athena, it processes the queries by assigning resources based on the overall service load and the amount of incoming requests. But I tried to run them at different days and hours, still would get about 5 queries being executed at the same time.

所以我的问题是这应该如何?如果是这样，那么大约有15个查询处于空闲状态并等待可用的广告位，那么最多可以提交20个查询的意义.

So my question is this how it supposed to be? If it is then what is the point of being able to submit up to 20 queries if roughly 15 of them would be idling and waiting for available slots.

在presto文档中偶然发现了HIVE CONNECTOR，其中包含 AWS Glue目录配置属性.在那里我们可以看到

Just stumbled across HIVE CONNECTOR in presto documentation, which has a section AWS Glue Catalog Configuration Properties. There we can see

hive.metastore.glue.max-connections:与Glue的最大并发连接数(默认为5).

hive.metastore.glue.max-connections: Max number of concurrent connections to Glue (defaults to 5).

这让我想知道这是否与我的问题有关.据我了解，Athena只是在EMR集群上运行的Presto，该集群已配置为使用AWS Glue数据目录作为元存储.

This got me wonder if it has something to do with my issue. As I understand, Athena is simply a Presto that runs on EMR cluster which is configured to use AWS Glue Data Catalog as the Metastore.

那么，如果我的问题来自于Athena的EMR集群仅使用默认值来与Glue进行并发连接，该怎么办呢?该值为5，这恰好是我所实际执行的并发查询数(平均)

So what if my issue comes from the fact that EMR cluster for Athena simply uses default value for concurrent connections to Glue, which is 5 which and is exactly of how many concurrent queries are actually getting executed (on average) in my case.

Athena团队最近为Athena部署了许多新功能.尽管QUEUED处于枚举状态已经有一段时间了，直到现在才使用它.因此，现在我可以在历史记录"选项卡中获得关于查询状态的正确信息，但是其他所有内容都保持不变.

The Athena team recently deployed a host of new functionality for Athena. although QUEUED has been in the state enum for some time is hasn't been used until now. So now I get, correct info about query state in a history tab, but everything else remains the same.

此外，另一篇文章也发布了类似的问题.

Also, another post was published with similar problem.

AWS Athena并发限制:提交的查询数与正在运行的查询数 [英] AWS Athena concurrency limits: Number of submitted queries VS number of running queries

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

AWS Athena并发限制:提交的查询数与正在运行的查询数 [英] AWS Athena concurrency limits: Number of submitted queries VS number of running queries

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭