HiveSQL访问JSON数组值 [英] HiveSQL access JSON-array values

查看:1009
本文介绍了HiveSQL访问JSON数组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Hive中有一个表,该表是通过从HDFS中的序列文件读取而生成的.这些序列文件是 json ,如下所示:

I have a table in Hive, which is generated by reading from a Sequence File in my HDFS. Those sequence files are json and look like this:

{"Activity":"Started","CustomerName":"CustomerName3","DeviceID":"StationRoboter","OrderID":"CustomerOrderID3","DateTime":"2018-11-27T12:56:47Z+0100","Color":[{"Name":"red","Amount":1},{"Name":"green","Amount":1},{"Name":"blue","Amount":1}],"BrickTotalAmount":3}

他们提交产品零件的颜色,并在一次服务过程中对它们的数量进行计数.

They submit product part colours and the amount of them which are counted in one service process run.

请注意颜色

因此,我创建表的代码是:

Therefore my code to create the table is:

CREATE EXTERNAL TABLE iotdata(
  activity              STRING,
  customername          STRING,
  deviceid              STRING,
  orderid               STRING,
  datetime              STRING,
  color                 ARRAY<MAP<String,String>>,
  bricktotalamount      STRING
)
ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION '/IoTData/scray-data-000-v0';

这有效,并且如果我在该表上执行选择* ,则它看起来像这样:

This works, and if I do a select * on that table it looks like this:

但是我的问题是,我必须访问颜色列中的数据以进行分析.例如,我要计算表中的所有红色值.

But my problem is, that I have to access the data inside the color column for analysis. For example, I want to calc all red values in the table.

因此,这带来了一些机会和问题:如何将创建的金额字符串转换为整数?

So this leads to several opportunities and questions: how can I cast the amount string which is created to an integer?

如何通过select访问色列中的数据?

How can I access the data in my color-column via select?

或者是否有可能在一开始就更改我的表模式,以便为我的4种颜色增加4个额外的列,并为相关色量增加4个额外的列?

Or is there a possibility to change my table schema right at the beginning to get 4 extra columns for my 4 colours and 4 extra columns for the related colour amounts?

我还尝试将整个json作为字符串读入一列,然后在其中选择子内容,但这

I also tried to read in the whole json as string to one column, and select the subcontent there, but this importing json array into hive leads me only to NULL values, propably because my json file is not 100% well-formed.

推荐答案

您可以分两步执行此操作.

You can do this in two steps.

创建正确的JSON表

CREATE external TABLE temp.test_json (
  activity string,
  bricktotalamount int,
  color array<struct<amount:int, name:string>>,
  customername string,
  datetime string,
  deviceid string,
  orderid string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
location '/tmp/test_json/table'

展开Select语句中的表

select activity, bricktotalamount, customername, datetime, deviceid, orderid, name, amount from temp.test_json
lateral view inline(color) c as amount,name

这篇关于HiveSQL访问JSON数组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆