我可以按/存储通过"CREATE TABLE AS SELECT ....."创建的表进行聚类吗?在蜂巢? [英] Can I cluster by/bucket a table created via "CREATE TABLE AS SELECT....." in Hive?

查看:109
本文介绍了我可以按/存储通过"CREATE TABLE AS SELECT ....."创建的表进行聚类吗?在蜂巢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Hive中创建一个表

I am trying to create a table in Hive

CREATE TABLE BUCKET_TABLE AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll
CLUSTERED BY (key) INTO 1000 BUCKETS;

此语法失败-但我不确定是否有可能执行此组合语句.有任何想法吗?

This syntax is failing - but I am not sure if it is even possible to do this combined statement. Any ideas?

推荐答案

在这个问题上出现问题,发现没有提供答案.我进一步看了一下,并在Hive文档中找到了答案.

Came across this question and saw there was no answer provided. I looked further and found the answer in the Hive documentation.

由于对CTAS的以下限制,这永远不会起作用:

This will never work, because of the following restrictions on CTAS:

  1. 目标表不能是分区表.
  2. 目标表不能是外部表.
  3. 目标表不能是列表存储表.

来源: https://cwiki.apache.org/confluence/display/Hive/LanguageManual + DDL#LanguageManualDDL-CreateTableAsSelect%28CTAS

此外 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

创建[临时] [外部]表[如果不存在] [数据库名称.]表名称
...
[由(col_name,col_name,...)聚类[由(col_name [ASC | DESC],...)排序]]放入num_buckets BUCKETS]
...
[AS select_statement];

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
...
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
...
[AS select_statement];

聚簇要求先定义该列,然后cfg转到As select_statement,因此这时是不可能的.

Clustering requires the column to be defined and then the cfg goes to the As select_statement Therefore at this time it is not possible.

(可选)您可以更改表并添加存储桶,但这不会更改现有数据.

Optionally, you can ALTER the table and add buckets, but this does not change existing data.

CREATE TABLE BUCKET_TABLE 
STORED AS ORC AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll limit 0;
ALTER TABLE BUCKET_TABLE CLUSTERED BY (key) INTO 1000 BUCKETS;
ALTER TABLE BUCKET_TABLE SET TBLPROPERTIES ('transactional'='true');
INSERT INTO BUCKET_TABLE 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll;

这篇关于我可以按/存储通过"CREATE TABLE AS SELECT ....."创建的表进行聚类吗?在蜂巢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆