MAP 数据类型中的 HIVE 嵌套 ARRAY [英] HIVE nested ARRAY in MAP data type

查看:63
本文介绍了MAP 数据类型中的 HIVE 嵌套 ARRAY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 HIVE 表结构如下:-

I have HIVE table structured like this:-

Create table test_stg(employee_id INT, name STRING, abu ARRAY <String>, sabu MAP <String, ARRAY<INT>)
row format delimited fields terminated by '|'                                                              
collection items terminated by '/'                                                                         
map keys terminated by ':'; 

我将使用 LOAD DATA LOCAL....

问题是我应该如何构建本地文件的内容,以便 Map 数据类型字段 sabu 可以具有嵌套数组.

The question is how should I frame the contents of my local file so that Map datatype field sabu can have nested array.

提前致谢.

推荐答案

Hive 的默认分隔符是:

Hive's default delimiters are:

  • 行分隔符 => Control-A ('01')
  • 集合项分隔符 => Control-B ('02')
  • 映射键分隔符 => Control-C ('03')

如果您覆盖这些分隔符,则在解析过程中将使用覆盖的分隔符.前面对定界符的描述对于平面数据结构的通常情况是正确的,其中复杂类型仅包含基本类型.对于嵌套类型,嵌套级别决定了分隔符.

If you override these delimiters then overridden delimiters are used during parsing. The preceding description of delimiters is correct for the usual case of flat data structures, where the complex types only contain primitive types. For nested types the level of the nesting determines the delimiter.

例如,对于数组数组,外部数组的分隔符是 Control-B ('02') 字符,正如预期的那样,但对于内部数组,它们是 Control-C ('03') 字符, 列表中的下一个分隔符.

For an array of arrays, for example, the delimiters for the outer array are Control-B ('02') characters, as expected, but for the inner array they are Control-C ('03') characters, the next delimiter in the list.

Hive 实际上支持八级定界符,分别对应 ASCII 码 1, 2, ... 8,但你只能覆盖前三个.

Hive actually supports eight levels of delimiters, corresponding to ASCII codes 1, 2, ... 8, but you can only override the first three.

对于嵌套 Array of Map 数据类型字段中的项目的大小写分隔符 sabu 将是 '04',因为 Map Key Delimiter 是 '03'(覆盖为 ':').

For your case delimiter for items in nested Array of Map datatype field sabu will be '04' as Map Key Delimiter is '03' (Overridden as ':').

因此您可以将输入文件编写为以下格式:

So you can write your input file as following format:

1|JOHN|abu1/abu2|key1:1'04'2'04'3/key2:6'04'7'04'8

SELECT * FROM test_stg; 的输出将是:

1       JOHN     ["abu1","abu2"]     {"key1":[1,2,3],"key2":[6,7,8]}

参考:Hadoop 权威指南 - 第 12 章:Hive,页码:433、434

Reference: Hadoop The Definitive Guide - Chapter 12: Hive, Page No: 433, 434

这篇关于MAP 数据类型中的 HIVE 嵌套 ARRAY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆