MAPE数据类型中的HIVE嵌套ARRAY [英] HIVE nested ARRAY in MAP data type

查看:867
本文介绍了MAPE数据类型中的HIVE嵌套ARRAY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 创建表test_stg(employee_id INT,name STRING,abu ARRAY< String>,sabu MAP< String,ARRAY< INT>)
以'|'结尾的行格式定界字段
以'/'结尾的集合项
由':'结尾的映射键;

我将使用 LOAD DATA LOCAL从本地文件系统导入数据。 ..



问题是我该如何构造我的本地文件的内容,以便映射数据类型字段 sabu 可以有嵌套的数组。



预先感谢。 Hive's默认分隔符是:
$ b


  • 行分隔符=> Control-A('\001')

  • Collection Item Delimiter => Control-B('\002')

  • 映射键定界符=> Control-C('\003')


如果您覆盖这些分隔符,则在解析过程中将使用重写的分隔符。前面对分隔符的描述对于平面数据结构的常见情况是正确的,其中复杂类型只包含基本类型。 对于嵌套类型,嵌套层次决定了分隔符



例如,对于数组数组,分隔符是外部数组正如预期的那样,Control-B('\002')字符,但对于内部数组,它们是Control-C('\ 003')字符,即列表中的下一个分隔符。

Hive实际上支持8个分隔符级别,对应于ASCII代码1,2,... 8,但您只能覆盖前三个分隔符。



对于嵌套数组Map数据类型字段 sabu 中的项目,您的大小写分隔符将为'\ 004',因为Map Key Delimiter为'\ 003'(覆盖如':')。



因此,您可以按照以下格式编写输入文件:

  1 | JOHN | abu1 / abu2 | key1:1'\\004'2'\004'3 / key2:6'\004'7'\004'8 

输出 SELECT * FROM test_stg; 将会是:

  1 JOHN [abu1,abu2] {key1:[1,2,3],key2:[6,7,8] } 

参考:Hadoop权威指南 - 第12章:Hive,页号:433,434


I have HIVE table structured like this:-

Create table test_stg(employee_id INT, name STRING, abu ARRAY <String>, sabu MAP <String, ARRAY<INT>)
row format delimited fields terminated by '|'                                                              
collection items terminated by '/'                                                                         
map keys terminated by ':'; 

I will import the data from local file system using LOAD DATA LOCAL....

The question is how should I frame the contents of my local file so that Map datatype field sabu can have nested array.

Thanks in advance.

解决方案

Hive's default delimiters are:

  • Row Delimiter => Control-A ('\001')
  • Collection Item Delimiter => Control-B ('\002')
  • Map Key Delimiter => Control-C ('\003')

If you override these delimiters then overridden delimiters are used during parsing. The preceding description of delimiters is correct for the usual case of flat data structures, where the complex types only contain primitive types. For nested types the level of the nesting determines the delimiter.

For an array of arrays, for example, the delimiters for the outer array are Control-B ('\002') characters, as expected, but for the inner array they are Control-C ('\003') characters, the next delimiter in the list.

Hive actually supports eight levels of delimiters, corresponding to ASCII codes 1, 2, ... 8, but you can only override the first three.

For your case delimiter for items in nested Array of Map datatype field sabu will be '\004' as Map Key Delimiter is '\003' (Overridden as ':').

So you can write your input file as following format:

1|JOHN|abu1/abu2|key1:1'\004'2'\004'3/key2:6'\004'7'\004'8

Output of SELECT * FROM test_stg; will be:

1       JOHN     ["abu1","abu2"]     {"key1":[1,2,3],"key2":[6,7,8]}

Reference: Hadoop The Definitive Guide - Chapter 12: Hive, Page No: 433, 434

这篇关于MAPE数据类型中的HIVE嵌套ARRAY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆