雅典娜跳过下划线开头的键 [英] athena skipping keys starting with underscore

查看:130
本文介绍了雅典娜跳过下划线开头的键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试与aws athena一起对存储在s3中的json文件进行一些查询.因此,我设法创建了一个简单的架构,一切似乎都很好,直到我注意到我的某些文件没有被考虑到为止.

I'm trying to work with aws athena to do some queries on json files we have stored in s3. So, I managed to create a simple schema and everything seemed to be fine until I noticed that some of my files are not accounted for.

文件的密钥是用户ID,其中一些以_开头.所有这些在雅典娜都失踪了.它们存在于s3中.我可以拿到.它们与其他文件相似.但是雅典娜没有看到他们.

The keys of the files are user ids, some of those start with _. All of those are missing in athena. They exist in s3. I can get them. They are similar to the other files. But Athena does not see them.

显然,它不喜欢按键开头的下划线.除了重命名所有文件之外,还有其他方法吗?密钥中其他地方的下划线似乎不是问题.

Obviously it does not like underscores at the beginning of keys. Is there a way around this other than renaming all the files? Underscores elsewhere in the key seem to be not an issue.

我的模式(我通过删除字段简化了它):

My schema (I simplified it by removing fields):

<代码>如果不存在则创建外部表db.table(`user_id`字符串)行格式序列'org.openx.data.jsonserde.JsonSerDe'带有SERDEPROPERTIES('serialization.format'='1')位置's3://xyz/myfiles/'TBLPROPERTIES('has_encrypted_data'='false');

推荐答案

查询表时,Amazon Athena会在底层使用Presto.Presto会忽略以下划线_或从presto版本0.60开始的点的文件.这是Hadoop MapReduce/Hive的行为

When you query a table, Amazon Athena uses Presto under the hood.Presto ignores files that start with an underscore underscore _ or a dot starting from presto version 0.60.This is the behavior of Hadoop MapReduce / Hive

https://prestodb.io/docs/current/release/release-0.60.html

请参考函数用于通过org.apache.hadoop.hive.common.FileUtils.HIDDEN_FILES_PATH_FILTER过滤隐藏文件.该属性源自

Refer to function used by presto to filter the hidden files with org.apache.hadoop.hive.common.FileUtils.HIDDEN_FILES_PATH_FILTER .As the property is derived from Hive the same applies to Hive tables which will ignore the files in particular location.

这篇关于雅典娜跳过下划线开头的键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆