加载结构或配置单元中的任何其他复杂数据类型 [英] load struct or any other complex data type in hive

查看:114
本文介绍了加载结构或配置单元中的任何其他复杂数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.xlsx文件,其中包含一些类似于下图的数据,正在尝试使用下面的创建查询来创建数据:

  create table aus_aboriginal(code int,area_name string,male_0_4 STRUCT ,male_5_9 STRUCT ,male_10_14 STRUCT   

当我将数据加载到它时,会得到空值。考虑使用,逗号分隔符的格式如下的csv文件。
Input.csv



Code,area_name,num,total,perc,num,total, perc,num,total,perc $ b $ 1100,Albury,90,444,17.4,73,546,13.4,86,546,15.8

1111,armid,40,404,14.4,97,701 ,13.8,76,701,10.8



预期的结果是在字段(num,total和perc)之外创建复杂类型, :


1100,Albury,struct <90,444,17.4>,struct <73,546,13.4>,struct <86,546,15.8 >
$ b $ 1111,armid,struct <40,404,14.4>,struct <97,701,13.8>,struct <76,701,10.8>
blockquote>

当我们试图在这种情况下使用下面的hive查询来创建一个复杂类型(num,total和perc)时,我们将在表中获得多个空值同样,逗号分隔符用于字段和集合,所以Hive查询未能按照我们的要求分离数据。

 蜂房>创建表aus_aboriginal(code int,area_name string,male_0_4 STRUCT ,male_5_9 STRUCT ,male_10_14 STRUCT   

输出:


1100 Albury {num:90.0,total:null,perc:null} {num:444.0,total:nul
l,perc:null} {num :17.4,total:null,perc:null}

1111 armid {num:40.0,total:null,perc: null} {num:404.0,total:nul
l,perc:null} {num:14.4,total:null,perc:null}



所用时间:0.15秒,提取:2行($)

我怀疑你正面临着这个问题。

结构的使用
现在考虑输入文件的数据格式如下,其中,逗号分隔符用于字段


1100,Albury,90#444#17.4,73#546#13.4, 86#546#15.8

1111,armid,40#404#14.4,97#701#13.8,76#701#10.8

在这种情况下,我们可以通过将#指定为收集项目的分隔符和字段来成功创建具有复杂类型的表格。请检查下面的hive查询。

  hive>创建表aus_aboriginal(code int,area_name string,male_0_4 STRUCT ,male_5_9 STRUCT ,male_10_14 STRUCT   

输出:

 蜂房> select * from aus_aboriginal; 




1100 Albury {num:90.0,total:444.0 ,perc:17.4} {num:73.0,total:546。
0,perc:13.4} {num:86.0,total:546.0,perc:15.8}

<1111 armid { num:40.0,total:404.0,perc:14.4} {num:97.0,total:701。
0,perc:13.8} {num:76.0,total:701.0,perc:10.8}

花费的时间:0.146秒,获取:2行($)

其他复杂类型也应采用类似的方法,请参阅下面的链接以获取更多信息。



参考:
http://edu-kinect.com/blog/2014/06/16/hive-complex-data-types-with-examples /


I have a .xlsx file which contains data some thing like the below image, am trying to create using the below create query

create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>, male_15_19 STRUCT<num:double, total:double, perc:double>, male_20_24 STRUCT<num:double, total:double, perc:double>, male_25_29 STRUCT<num:double, total:double, perc:double>, male_30_34 STRUCT<num:double, total:double, perc:double>, male_35_39 STRUCT<num:double, total:double, perc:double>, male_40_44 STRUCT<num:double, total:double, perc:double>, male_45_49 STRUCT<num:double, total:double, perc:double>, male_50_54 STRUCT<num:double, total:double, perc:double>, male_55_59 STRUCT<num:double, total:double, perc:double>, male_60_64 STRUCT<num:double, total:double, perc:double>, male_above_65 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

When am loading the data into it am getting nulls.. I am missing something in the create table it self.. any help to fix it..

解决方案

On using complex types like struct, it is recommended to use a unique delimiter for collection than the one used for fields (columns). Consider a csv file in below format where "," comma separator is used. Input.csv

Code, area_name,num,total,perc,num,total,perc,num,total,perc 1100,Albury,90,444,17.4,73,546,13.4,86,546,15.8

1111,armid,40,404,14.4,97,701,13.8,76,701,10.8

Expected result is to create a complex type out of fields (num, total and perc):

1100,Albury,struct<90,444,17.4>,struct<73,546,13.4>,struct<86,546,15.8>

1111,armid, struct<40,404,14.4>, struct<97,701,13.8>,struct<76,701,10.8>

when we try to create a complex type out of fields (num, total and perc) in this case using following hive query, we will get multiple null values in the table since the same "," comma delimiter is used for both fields and collections, so Hive query failed to segregate the data as we required.

Hive> create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY  ',' COLLECTION ITEMS TERMINATED BY ',' LOCATION '/csv';

Output:

1100 Albury {"num":90.0,"total":null,"perc":null} {"num":444.0,"total":nul l,"perc":null} {"num":17.4,"total":null,"perc":null}

1111 armid {"num":40.0,"total":null,"perc":null} {"num":404.0,"total":nul l,"perc":null} {"num":14.4,"total":null,"perc":null}

Time taken: 0.15 seconds, Fetched: 2 row(s)

I am suspecting that you are facing this problem.

Usage of Struct Now consider the input file having data in below format, where "," comma delimiter is used for fields and for collection items "#" is used as delimiter.

1100,Albury,90#444#17.4,73#546#13.4,86#546#15.8

1111,armid,40#404#14.4,97#701#13.8,76#701#10.8

In this case, we can successfully create a table with complex type by specifying # as delimiter for collection items and , for fields. Please check below hive query.

hive> create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY  ',' COLLECTION ITEMS TERMINATED BY '#' LOCATION '/csv';

Output:

hive> select * from aus_aboriginal;

1100 Albury {"num":90.0,"total":444.0,"perc":17.4} {"num":73.0,"total":546. 0,"perc":13.4} {"num":86.0,"total":546.0,"perc":15.8}

1111 armid {"num":40.0,"total":404.0,"perc":14.4} {"num":97.0,"total":701. 0,"perc":13.8} {"num":76.0,"total":701.0,"perc":10.8}

Time taken: 0.146 seconds, Fetched: 2 row(s)

Similar approach should be taken for other complex types as well, refer below link for more information.

Reference: http://edu-kinect.com/blog/2014/06/16/hive-complex-data-types-with-examples/

这篇关于加载结构或配置单元中的任何其他复杂数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆