在蜂巢中使用外部表格支持数组列类型的最佳方式是什么? [英] What's the best way to support array column types with external tables in hive?

查看:103
本文介绍了在蜂巢中使用外部表格支持数组列类型的最佳方式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有制表符分隔数据的外部表格。一个简单的表如下所示:

 如果不存在,创建外部表
(id字符串,标记字符串,legid字符串,图像字符串,父字符串,created_date字符串,time_stamp int)
行格式界限字段终止''\ t'
LOCATION's3n:// somewhere /';

现在我在结尾添加另一个字段,它将以逗号分隔值列表。



有没有一种方法可以像我指定字段终止符一样指定它,还是必须依赖其中一个serdes?



例如:

  ... list_of_names ARRAY< String>)
ROW FORMAT DELIMITED FIELDS'\ t'ARRAY ELEMENTS SEPARATED BY','
...

(我假设我需要为此使用一个serde,但我认为没有任何问题)

解决方案

我不知道如何更新一个现有的表来做到这一点,但创建一个表;您可以在 https:// cwiki中深入找到您要查找的内容。 apache.org/confluence/display/Hive/LanguageManual+DDL
从那里开始的一段代码

  row_format 
:DELIMITED [字段终止的字符串] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEY TERMINATED BY char] [LINES TERMINATED BY char]

从我们的表创建是

pre code CREATE TABLE IF NOT EXISTS访问

...列删除。 ..

分隔(用户日期字符串)
行格式限定
字段终止'\001'
终止'\002'的集合项
MAP KEYS终止于'\ 003'
存储为TEXTFILE
;

您要查找的行是 COLLECTION ITEMS TERMINATED BY char 为一个数组。



hth


So i have external tables of tab delimited data. A simple table looks like this:

create external table if not exists categories
(id string, tag string, legid string, image string, parent string, created_date string, time_stamp int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://somewhere/';

Now I'm adding another field to the end, it will be a comma separated list of values.

Is there a way to specify this in the same way that I specify a field terminator, or do I have to rely on one of the serdes?

eg:

...list_of_names ARRAY<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ARRAY ELEMENTS SEPARATED BY ','
...

(I'm assuming I'll need to use a serde for this, but I figured there wasn't any harm in asking)

解决方案

I don't know how to update an existing table to do that, but for creating a table; what you are looking for can be found, in depth, at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL. A snippet from there

row_format
  : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

An example from our table creation is

CREATE TABLE IF NOT EXISTS visits
(
    ... Columns Removed...
)
    PARTITIONED BY (userdate STRING)
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '\001'
        COLLECTION ITEMS TERMINATED BY '\002'
        MAP KEYS TERMINATED BY '\003'
    STORED AS TEXTFILE
;

The line from that you'd be looking for is the COLLECTION ITEMS TERMINATED BY char for an array.

hth

这篇关于在蜂巢中使用外部表格支持数组列类型的最佳方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆