如何在 Hive 中定义嵌套的集合项 [英] How to Define nested Collection items in Hive
问题描述
我正在尝试创建一个带有嵌套集合项的配置单元表.假设我有一个结构数组.
I am trying to create a hive table with nested Collection items. Suppose I have an array of struct.
CREATE TABLE SAMPLE(
record array<struct<col1:string,col2:string>>
)row format delimited
fields terminated by ','
collection items terminated by '|';
第一级,分隔符 ',' 将覆盖默认分隔符 '^A'.
First level, the separator ',' will override the default delimiter '^A'.
二级,分隔符'|'将覆盖默认的第二级分隔符 '^B' 以分离出最外层的结构(即数组).
Second level, the separator '|' will override the default second level delimiter '^B' to separate out the outer most structure (i.e. Array).
第三级 hive 将使用默认的第三级分隔符 '^C' 作为 Struct 的分隔符
Third level hive will use the default third level delimiter '^C' as the separator for the Struct
现在我的问题是如何为第二级(即 Struct)定义分隔符,因为 '^C' 字符既难以阅读又难以生成.
Now my question is how can I define a separator for the second level (i.e. Struct), because '^C' character is hard to read as well as to generate.
有什么方法可以明确定义分隔符而不是 ^C 吗?
Is there any way to explicitly define the separator instead of ^C ?
提前致谢.
推荐答案
试试这个:
CREATE TABLE SAMPLE(
id BIGINT,
record array<struct<col1:string,col2:string>>
)row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by ':';
现在文本文件中的数据将如下所示:
Now you data in text file will look like this:
1345653,110909316904:1341894546|221065796761:1341887508
然后你可以像这样查询:
You can then query it like :
select record.col1 from SAMPLE;
这篇关于如何在 Hive 中定义嵌套的集合项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!