如何从Kinesis Analytics(SQL)格式化为字符串的json内部选择数据 [英] How to select data from inside a json formated as string from Kinesis Analytics (SQL)

查看:95
本文介绍了如何从Kinesis Analytics(SQL)格式化为字符串的json内部选择数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个运动数据流,它以这种格式传递数据:

I have a kinesis data stream that delivers data in this format:

created_at:时间戳记 有效负载:varchar(6000)

created_at: timestamp payload: varchar(6000)

payload元素的简化示例

{
    "version": 2.0,
    "data": {
        "whatever": "someString",
        "observations": [{
            "obs_id": 1,
            "locaiton": {
                "lat": 10.000,
                "lng": 20.000
            }
        }, {
            "obs_id": 2,
            "locaiton": {
                "lat": 10.0001,
                "lng": 20.0001
            }
        }]
    }
}

实时,列payload中的数组data.observations通常长在0到200个元素之间.

In real time, the array data.observations in column payload is usually between 0 and 200 elements long.

我正在尝试扩展payload中的数据,并为其中的每个元素创建一个新行.对于此示例,我的预期结果应该是具有以下结构的数据流:

I'm trying to expand data in payload, and crate a new row for every element in there. My expected outcome for this example should be a datastream with the following structure:

created_at时间戳记-从根开始 obs_id整数,-从data.observations内部 location_lat:整数,-来自data.observations.location内部 location_lng:整数,-来自data.observations.location内部 版本:来自根的整数

created_at timestamp, -- from root obs_id integer, -- from inside of data.observations location_lat: integer, -- from inside data.observations.location location_lng: integer, -- from inside data.observations.location version: integer from root

这是我现在的位置,它正在工作(但不提取json)

This is where I am now, this is working (but not extracting the json)

-- CREATE OR REPLACE STREAM for cleaned up referrer
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "created_at" TIMESTAMP,
    "version" Integer
    );

CREATE OR REPLACE PUMP "myPUMP" AS 
   INSERT INTO "DESTINATION_SQL_STREAM"
      SELECT STREAM 
         "created_at", 
         "version"
      FROM "SOURCE_SQL_STREAM_001";

但是,如果我尝试这样做,则会中断:

However, if I try to do this, it breaks:

-- CREATE OR REPLACE STREAM for cleaned up referrer
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "created_at" TIMESTAMP,
    "version" Integer,
    "obs_id" integer 
    );

CREATE OR REPLACE PUMP "myPUMP" AS 
   INSERT INTO "DESTINATION_SQL_STREAM"
      SELECT STREAM 
         "created_at", 
         "version",
         "data"."observations"."obs_id" as obs_id
      FROM "SOURCE_SQL_STREAM_001";

错误是:table data not found

任何帮助,高度赞赏!

我现在尝试了这个:

-- CREATE OR REPLACE STREAM for cleaned up referrer
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "version" Integer
    , "whatever" varchar(10)
);

CREATE OR REPLACE PUMP "myPUMP" AS 
   INSERT INTO "DESTINATION_SQL_STREAM"
      SELECT STREAM 
        "version"
        , json_extract("data", "$.whatever") AS whatever,
      FROM "SOURCE_SQL_STREAM_001";

我得到了错误:

org.eigenbase.sql.parser.SqlParseException: Encountered "FROM" at line 10, column 7. Was expecting one of: "*" ... <IDENTIFIER> ... <QUOTED_IDENTIFIER> ... <UNICODE_QUOTED_IDENTIFIER> ... "+" ... "-" ... <UNSIGNED_INTEGER_LITERAL> ... <DECIMAL_NUMERIC_LITERAL> ... <APPROX_NUMERIC_LITERAL> ... <BINARY_STRING_LITERAL> ... <PREFIXED_STRING_LITERAL> ... <QUOTED_STRING> ... <UNICODE_STRING_LITERAL> ... "TRUE" ... "FALSE" ... "UNKNOWN" ... "NULL" ... <LBRACE_D> ... <LBRACE_T> ... <LBRACE_TS> ... "DATE" ... "TIME" ... "TIMESTAMP" ... "INTERVAL" ... "?" ... "CAST" ... "DATEDIFF" ... "EXTRACT" ... "POSITION" ... "CONVERT" ... "TRANSLATE" ... "OVERLAY" ... "FLOOR" ... "CEIL" ... "CEILING" ... "STEP" ... "TUMBLE_WINDOW" ... "SUBSTRING" ... "TRIM" ... "FIRST_VALUE" ... "LAST_VALUE" ... "LAG" ... "NTH_VALUE" ... <LBRACE_FN> ... "MULTISET" ... "SPECIFIC" ... "ABS" ... "ANY" ... "AVG" ... "CARDINALITY" ... "CHAR_LENGTH" ... "CHARACTER_LENGTH" ... "COALESCE" ... "COLLECT" ... "CUME_DIST" ... "COUNT" ... "CURRENT_DATE" ... "CURRENT_TIME" ... "CURRENT_TIMESTAMP" ... "DENSE_RANK" ... "ELEMENT" ... "EVERY" ... "EXP_AVG" ... "EXP" ... "FUSION" ... "INITCAP" ... "LN" ... "LOCALTIME" ... "LOCALTIMESTAMP" ... "LOWER" ... "MAX" ... "MIN" ... "MOD" ... "NULLIF" ... "OCTET_LENGTH" ... "PERCENT_RANK" ... "POWER" ... "RANK" ... "ROW_NUMBER" ... "SQRT" ... "STDDEV" ... "STDDEV_POP" ... "STDDEV_SAMP" ... "SUM" ... "UPPER" ... "VAR_POP" ... "VAR_SAMP" ... "CURRENT_CATALOG" ... "CURRENT_DEFAULT_TRANSFORM_GROUP" ... "CURRENT_PATH" ... "ROWNUM" ... "CURRENT_ROLE" ... "CURRENT_SCHEMA" ... "CURRENT_USER" ... "SESSION_USER" ... "SYSTEM_USER" ... "USER" ... "NEW" ... "CASE" ... "PERIOD" ... "TSDIFF" ... "CURSOR" ... "ROW" ... "NOT" ... "EXISTS" ... "(" ...

推荐答案

根据 https://docs.aws.amazon.com/athena/latest/ug/extracting-data-from-JSON.html

您可以为此使用json_extract.

如下所示

select data from vendor_meraki_data_raw
limit 5 
),

jsondata as(

select
  json_extract(data, '$.data') as fulldata
 from dataset
)

select
  json_extract(fulldata, '$.apMac') as apMac
from jsondata``` 

这篇关于如何从Kinesis Analytics(SQL)格式化为字符串的json内部选择数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆