在 SQL (Athena) 中取消嵌套:如何将结构数组转换为从结构中提取的值数组? [英] Unnesting in SQL (Athena): How to convert array of structs into an array of values plucked from the structs?

查看:26
本文介绍了在 SQL (Athena) 中取消嵌套:如何将结构数组转换为从结构中提取的值数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从贝叶斯统计模型中获取样本,使用 Avro 将它们序列化,将它们上传到 S3,然后使用 Athena 查询它们.

I am taking samples from a Bayesian statistical model, serializing them with Avro, uploading them to S3, and querying them with Athena.

我需要帮助编写一个查询,在表中取消嵌套数组.

I need help writing a query that unnests an array in the table.

CREATE TABLE 查询如下所示:

The CREATE TABLE query looks like:

CREATE EXTERNAL TABLE `model_posterior`(
  `job_id` bigint,
  `model_id` bigint,
  `parents` array<struct<`feature_name`:string,`feature_value`:bigint, `is_zid`:boolean>>,
  `posterior_samples` struct <`parameter`:string,`is_scaled`:boolean,`samples`:array<double>>)

posterior_samples"列中的samples"数组是存储样本的地方.我设法使用以下查询取消嵌套posterior_samples"结构:

The "samples" array in the "posterior_samples" column is where the samples are stored. I have managed to unnest the "posterior_samples" struct with the following query:

WITH samples AS (
    SELECT model_id, parents, sample, sample_index
    FROM posterior_db.model_posterior 
    CROSS JOIN UNNEST(posterior_samples.samples) WITH ORDINALITY AS t (sample, sample_index)
    WHERE job_id = 111000020709
)
SELECT * FROM samples

现在我想要的是取消嵌套父母列.此列中的每条记录都是一个结构数组.我正在尝试创建一个列,该列仅包含该结构数组中feature_value"键的值数组.(我想要一个数组的原因是父数组的长度可以大于 1).

Now what I want is to unnest the parents column. Each record in this column is an array of structs. I am trying to create a column that just has an array of values for the "feature_value" keys in that array of structs. (The reason why I want an array is that the parents array can have a length > 1).

换句话说,对于父行中的每个数组,我想要一个大小相同的数组.该数组应仅包含原始数组中结构中feature_value"键的值.

In other words for each array in the parents row, I want an array of the same size. That array should contain only the values of the "feature_value" key from the structs in the original array.

关于如何解决这个问题有什么建议吗?

Any advice on how to solve this?

谢谢.

推荐答案

您可以使用 transform 函数描述 此处.假设我们有名为 samples 的表,其结构在您的问题中提到.然后您可以编写如下所示的查询

You can use transform function described here. Assuming we have table named samples with structure mentioned in your question. Then you can write query that looks something like as follows

SELECT *, transform(parents, parent -> parent.feature_value) as only_ feature_values

FROM samples

注意:这些查询在语法上并不完美,但您可以使用它.

希望这会有所帮助.干杯:)

Hope this would help. Cheers :)

这篇关于在 SQL (Athena) 中取消嵌套:如何将结构数组转换为从结构中提取的值数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆