如何在 Hive 中定义嵌套的集合项 [英] How to Define nested Collection items in Hive

查看:27
本文介绍了如何在 Hive 中定义嵌套的集合项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个带有嵌套集合项的配置单元表.假设我有一个结构数组.

I am trying to create a hive table with nested Collection items. Suppose I have an array of struct.

    CREATE TABLE SAMPLE(
    record array<struct<col1:string,col2:string>>
    )row format delimited
    fields terminated by ','
    collection items terminated by '|';

第一级,分隔符 ',' 将覆盖默认分隔符 '^A'.

First level, the separator ',' will override the default delimiter '^A'.

二级,分隔符'|'将覆盖默认的第二级分隔符 '^B' 以分离出最外层的结构(即数组).

Second level, the separator '|' will override the default second level delimiter '^B' to separate out the outer most structure (i.e. Array).

第三级 hive 将使用默认的第三级分隔符 '^C' 作为 Struct 的分隔符

Third level hive will use the default third level delimiter '^C' as the separator for the Struct

现在我的问题是如何为第二级(即 Struct)定义分隔符,因为 '^C' 字符既难以阅读又难以生成.

Now my question is how can I define a separator for the second level (i.e. Struct), because '^C' character is hard to read as well as to generate.

有什么方法可以明确定义分隔符而不是 ^C 吗?

Is there any way to explicitly define the separator instead of ^C ?

提前致谢.

推荐答案

试试这个:

CREATE TABLE SAMPLE(
id BIGINT,
record array<struct<col1:string,col2:string>>
)row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by ':';

现在文本文件中的数据将如下所示:

Now you data in text file will look like this:

1345653,110909316904:1341894546|221065796761:1341887508

然后你可以像这样查询:

You can then query it like :

select record.col1 from SAMPLE;

这篇关于如何在 Hive 中定义嵌套的集合项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆