我可以将多个位置指向同一个配置单元外部表吗? [英] Can i point multiple location to same hive external table?

查看:92
本文介绍了我可以将多个位置指向同一个配置单元外部表吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要同时处理多个月的数据.因此,是否可以将多个文件夹指向外部表? 例如 Create external table logdata(col1 string, col2 string........) location s3://logdata/april, s3://logdata/march

I need to process the multiple months data simultaneously. So, is there an option to point multiple folders to external table? e.g. Create external table logdata(col1 string, col2 string........) location s3://logdata/april, s3://logdata/march

推荐答案

简单答案:否,在创建过程中Hive external表的location必须是唯一的,元存储库需要使用它来了解您的位置.餐桌生活.

Simple answer: no, the location of a Hive external table during creation has to be unique, this is needed by the metastore to understand where your table lives.

话虽这么说,您可能可以摆脱使用分区的麻烦:您可以为每个分区指定一个location,这似乎是您最终想要的分区,因为按月划分.

That being said, you can probably get away with using partitions: you can specify a location for each of your partitions which seems to be what you want ultimately since you are splitting by month.

因此,像这样创建您的表:

So create your table like this:

create external table logdata(col1 string, col2 string) partitioned by (month string) location 's3://logdata'

然后您可以添加如下分区:

Then you can add partitions like this:

alter table logdata add partition(month='april') location 's3://logdata/april'

您每个月都要这样做一次,现在您可以查询表以指定所需的分区,并且Hive只会查看您实际需要数据的目录(例如,如果您仅处理4月和6月,配置单元可能不会加载)

You do this for every month, and now you can query your table specifying whichever partition you want, and Hive will only look at the directories for which you actually want data (for example if you're only processing april and june, Hive will not load may)

这篇关于我可以将多个位置指向同一个配置单元外部表吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆