亚马逊AWS Athena S3和冰川混合桶 [英] Amazon AWS Athena S3 and Glacier Mixed Bucket
问题描述
带有S3 Glacier的Amazon Athena日志分析服务
S3中有PB级的数据。我们是
屏幕截图中省略了上面和下面屏幕截图中显示的S3文件对象名称。 HIVE_CURSOR_ERROR
中的文件引用实际上是Glacier对象。您可以在我们的S3存储桶的屏幕截图中看到它。
注意我试图在
自 2019年2月18日发布雅典娜将忽略具有GLACIER存储类的对象,而不是使查询失败:
[…]作为解决此问题的结果,雅典娜忽略了转换为GLACIER存储类的对象。雅典娜不支持从GLACIER存储类查询数据。
Amazon Athena Log Analysis Services with S3 Glacier
We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR
failure.
Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena because of this. The file referenced in the error is a Glacier backup.
My guess is the answer will be: don't keep glacier backups in the same bucket. We don't have this option with ease due to our data volume sizes. I believe Athena will not work in this setup and we will not be able to use Athena for our log analysis.
However if there is a way we can use Athena, we would be thrilled. Is there a solution to HIVE_CURSOR_ERROR
and a way to skip Glacier files? Our s3 bucket is a flat bucket without folders.
The S3 file object name shown in the above and below screenshots is omitted from the screenshot. The file reference in the HIVE_CURSOR_ERROR
is in fact the Glacier object. You can see it in this screenshot of our S3 Bucket.
Note I tried to post on https://forums.aws.amazon.com/ but that was no bueno.
Since the release of February 18, 2019 Athena will ignore objects with the GLACIER storage class instead of failing the query:
[…] As a result of fixing this issue, Athena ignores objects transitioned to the GLACIER storage class. Athena does not support querying data from the GLACIER storage class.
这篇关于亚马逊AWS Athena S3和冰川混合桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!