在AWS Athena中的json文件中存储多个元素 [英] Store multiple elements in json files in AWS Athena

查看:218
本文介绍了在AWS Athena中的json文件中存储多个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在S3存储桶中存储了一些json文件,其中每个文件都有多个具有相同结构的元素.例如,

I have some json files stored in a S3 bucket , where each file has multiple elements of same structure. For example,

[{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}},{"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}},{"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}]

我想在雅典娜中创建一个与上述数据相对应的表.

I want to create a table in Athena corresponding to above data.

我为创建表而编写的查询:

The query I wrote for creating the table:

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.elb_logs2 (
  `eventId` string,
  `eventName` string,
  `eventVersion` string,
  `eventSource` string,
  `awsRegion` string,
  `image` map<string,string> 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'field.delim' = ' '
) LOCATION 's3://<bucketname>/';

但是,如果我按如下所示执行SELECT查询,

But if I do a SELECT query as follows,

SELECT * FROM sampledb.elb_logs4;

我得到以下结果:

1   {"eventid":"1","eventversion":"1.0","image":{"id":"101","message":"New item!"},"eventsource":"aws:dynamodb","eventname":"INSERT","awsregion":"us-west-2"}   {"eventid":"2","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"MODIFY","awsregion":"us-west-2"}   {"eventid":"3","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"REMOVE","awsregion":"us-west-2"}   

此处将json文件的全部内容选为一项.

The entire content of the json file is picked as one entry here.

如何将json文件的每个元素作为一个条目读取?

How can I read each element of json file as one entry?

如何读取图像的每个子列,即地图的每个元素?

How can I read each subcolumn of image, i.e., each element of the map?

谢谢.

推荐答案

问题1:在AWS Athena的json文件中存储多个元素

我需要将json文件重写为

I need to rewrite my json file as

{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2", "image":{"Message":新项目!","Id":101}},{"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource :" aws:dynamodb," awsRegion:" us-west-2," image:{" Message:"此项已更改," Id:101}},{" eventId:" 3," eventName:"删除," eventVersion:" 1.0," eventSource:" aws:dynamodb," awsRegion:" us-west-2," image:{" Message: 此项目已更改","Id":101}}

{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}}, {"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}, {"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}

那是

删除方括号[]将每个元素放在一行中

Remove the square brackets [ ] Keep each element in one line

{.....................}
{.....................}
{.....................}

问题2.访问非线性json属性

CREATE EXTERNAL TABLE IF NOT EXISTS <tablename> (
  `eventId` string,
  `eventName` string,
  `eventVersion` string,
  `eventSource` string,
  `awsRegion` string,
  `image` struct <`Id` : string,
                  `Message` : string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
   "dots.in.keys" = "true"
) LOCATION 's3://exampletablewithstream-us-west-2/';

查询:

select image.Id, image.message from <tablename>;

参考:

http://engineering.skybettingandgaming.com/2015 /01/20/parsing-json-in-hive/

https://github.com/rcongiu /Hive-JSON-Serde#mapping-hive-keywords

这篇关于在AWS Athena中的json文件中存储多个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆