通过"$ path"查询场地 [英] Query by "$path" field

查看:52
本文介绍了通过"$ path"查询场地的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按表内分区下的一个文件/一组文件进行查询.我发现当我使用"$ path"字段时,Athena会扫描整个分区,而不是我想要的文件

I want to query by a file / group of files under a partition inside a table. I found out that when I'm using the "$path" field Athena scans the entire partition, and not the files I want

是否有一种方法可以使这种查询更加有效,并且仅扫描给定的文件?类似于文件的分区修剪...

Is there a way to make this kind of query more efficient and scan only the given files? Something like partition pruning for files...

这是一个示例查询:

SELECT *
FROM my_table
WHERE day = '2019-01-01'
      AND "$path" = 's3://my-bucket/my-table/day=2019-01-01/my_file'

推荐答案

否.不能通过使用 $ path 或我所知道的任何其他方法来使Athena仅扫描所需的文件,而又不能对表进行不同的分区.

No. It's not possible to get Athena to scan only the file you want by using $path, or any other method that I know of, without partitioning your table differently.

如果这是一种常见的操作,我建议您缩小分区并更好地匹配文件,但是如果只是偶尔执行一下操作,则不必担心太多.

If this is a common operation I suggest making your partitions smaller and match the files better, but if it's just something you do once in a while I wouldn't worry too much about it.

如果您有多种访问模式,但这不是主要的访问模式,但仍然很常见,您可以使用 org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat 创建一个单独的表code>输入格式,并使用指向原始表文件的 symlink.txt 文件创建1:1结构的分区.您可以在此StackOverflow答案中了解有关此表创建方式的更多信息(下半部分)–但是我认为这是解决问题的非常复杂的方法.

If you have multiple access patterns, and this isn't the primary, but still not uncommon pattern, you can create a separate table using the org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat input format, and create a 1:1 structure of partitions with symlink.txt files pointing to the files of the original table. You can read more about this way of creating tables in this StackOverflow answer (the second half) – but I think it will be a very complicated way to solve it.

这篇关于通过"$ path"查询场地的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆