雅典娜查询数组列 [英] Athena Query for Array Column

查看:92
本文介绍了雅典娜查询数组列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在查询雅典娜的数组列时,我需要您的帮助.目前,我有一个如下所述的表格:

I require your help in querying the array column in athena. Presently i have a table as mentioned below:

1   2020-05-06 01:13:48 dv1 [{addedtitle=apple, addedvalue=null, keytitle=Increase apple, key=p9, recvalue=0.899999999, unit=lbs, isbalanced=null}, {addedtitle=Orange (12%), addedvalue=15.0, keytitle=Increase Orange, key=p8, recvalue=18.218999999999998, unit=fl oz, isbalanced=null}, {addedtitle=Lemon, addedvalue=32.0, keytitle=Increase Lemon, key=p10, recvalue=33.6, unit=oz, isbalanced=null}, {addedtitle=Calcium (100%), addedvalue=86.0, keytitle=Increase Calcium , key=p6, recvalue=88.72002, unit=oz, isbalanced=null}, {addedtitle=Mango, addedvalue=10.0, keytitle=Increase Mango, key=p11, recvalue=11.7, unit=oz, isbalanced=null}]
2   2020-05-07 04:30:45 dev2    [{addedtitle=apple (12%), addedvalue=0.0, keytitle=Increase apple, key=p8, recvalue=0.88034375, unit=fl oz, isbalanced=null}, {addedtitle=Orange(31.4%), addedvalue=0.0, keytitle=Decrease Orange, key=p10, recvalue=1.83733225, unit=fl oz, isbalanced=null}, {addedtitle=Tree, addedvalue=0.0, keytitle=Increase Tree, key=p11, recvalue=1.69, unit=oz, isbalanced=null}]
5   2020-05-06 12:55:12 dev5    [{addedtitle=salt, addedvalue=0.0, keytitle=Increase salt, key=p9, recvalue=0.052500000000000005, unit=lbs, isbalanced=null}]
6   2020-05-08 07:03:59 dev6    [{addedtitle=Sugar, addedvalue=6.0, keytitle=Decrease sugar, key=p9, recvalue=2.4000000000000004, unit=fl oz, isbalanced=null}]
7   2020-05-06 12:52:39 dev7    []
8   2020-05-06 04:15:05 dev8    []
9   2020-05-07 05:02:38 dev9    []

我需要将此第3个数组列细分为其他列,以便我可以快速将其导入.当前存在问题,因为quicksight无法识别第3列,因为它显示了不受支持的数据类型.

I need to breakdown this 3rd array column into further columns so that i can import this in quicksight. Presently have a problem as quicksight does not recognize the 3rd column as it shows unsupported data types.

有人可以帮忙如何将该数组分成多列/多行进行分析吗?

Can somebody please help as how to work on breaking this array into columns/rows for analyses?

推荐答案

不幸的是,示例中的类似JSON的数据不是Athena可以解析的格式.

The JSON-like data in your example is unfortunately not in a format that Athena can parse.

对于发现此问题的其他人,我可以解释一下如果数据是JSON格式(例如{"addedtitle": "apple",…而不是{addedtitle=apple,…)怎么办.我还要假设各列之间有制表符,而不是空格(如果有空格,则必须使用

For anyone else finding this question I can explain how it can be done if the data is JSON formatted (e.g. {"addedtitle": "apple",… and not {addedtitle=apple,…). I'm also going to assume that there are tabs between the columns and not spaces (if there are spaces you have to use the Grok serde).

首先,您创建一个表,该表读取制表符分隔的值:

First you create a table that reads tab-separated values:

CREATE EXTERNAL TABLE my_table (
  line_number int,
  date_stamp timestamp,
  id string,
  data string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://my-bucket/path/to/data/'

请注意如何将data列键入为string而不是复杂类型.如果每行只有JSON,我们可以使用JSON Serde并将其指定为复杂类型-但据我所知,TSV的Serde不支持复杂类型(也没有嵌入式JSON).

Note how the data column is typed as string and not a complex type. Had each row been only JSON we could have used the JSON serde and specified the type as a complex type – but as far as I know the serde for TSV does not support complex types (nor embedded JSON).

要从JSON数据中提取属性,我们可以使用 JSON函数,我们可以使用UNNEST从每个元素创建行.您可能想将两者结合起来,例如:

To extract properties from the JSON data we can use JSON functions, and we can use UNNEST create rows from each element. You are probably after a combination of the two, for eample:

SELECT
  id,
  JSON_EXTRACT_SCALAR(element, '$.addedtitle') AS addedtitle,
  JSON_EXTRACT_SCALAR(element, '$.recvalue') AS recvalue,
FROM my_table, UNNEST (JSON_PARSE(data) as ARRAY(JSON)) AS t(element)

鉴于您问题中的数据将返回:

Given the data in your question this would return:

id   | addedtitle    | recvalue
-----+---------------+----------------------
dv1  | apple         | 0.899999999
dv1  | Orange        | 18.218999999999998
dv1  | Lemon         | 33.6
dv1  | Calcium       | 88.72002
dv1  | Mango         | 11.7
dev2 | apple (12%)   | 0.88034375
dev2 | Orange(31.4%) | 1.83733225
dev2 | Tree          | 1.69
dev5 | salt          | 0.052500000000000005
dev6 | Sugar         | 2.4000000000000004

请注意,以上假设data列是有效的JSON,从您的问题来看,情况并非如此.数据看起来不像雅典娜支持的格式.

这篇关于雅典娜查询数组列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆