使用按记录分组的最后5条记录的多维数组调用Azure流分析UDF [英] Call Azure Stream Analytics UDF with multi-dimensional array of last 5 records, grouped by record

查看:92
本文介绍了使用按记录分组的最后5条记录的多维数组调用Azure流分析UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Stream Analytics查询中调用AzureML UDF,并且UDF期望由5行2列组成的数组.输入数据是从IoT中心传输的,传入消息中有两个字段:温度"和温度".湿度.

I am trying to call an AzureML UDF from Stream Analytics query and that UDF expects an array of 5 rows and 2 columns. The input data is streamed from an IoT hub and we have two fields in the incoming messages: temperature & humidity.

这将是传递查询":

SELECT GetMetadataPropertyValue([room-telemetry], 'IoTHub.ConnectionDeviceId') AS RoomId, 
       Temperature, Humidity
INTO
    [maintenance-alerts]
FROM
    [room-telemetry]

我有一个AzureML UDF(成功创建),应使用每个RoomId的最后5条记录进行调用,并将从ML模型返回一个值.显然,我的流中有多个房间,因此我需要找到一种方法来对每个RoomId分组的5条记录进行窗口化.我似乎没有找到一种方法来调用从输入流中选择的正确数组的UDF.我知道我可以创建一个Javascript UDF,它会从特定字段返回一个数组,但这将是按记录/按记录的,在这里,我需要使用按RoomId分组的多个记录.

I have an AzureML UDF (successfully created) that should be called with the last 5 records per RoomId and that will return one value from the ML Model. Obviously, there are multiple rooms in my stream, so I need to find a way to get some kind of windowing of 5 records Grouped per RoomId. I don't seem to find a way to call the UDF with the right arrays selected from the input stream. I know I can create a Javascript UDF that would return an array from the specific fields, but that would be record/by record, where here I would need this with multiple records that are grouped by the RoomId.

有人有见识吗?

最诚挚的问候

推荐答案

在@jean-sébastien的好建议和

After the good suggestion of @jean-sébastien and an answer to an isolated question for the array-parsing, I finally was able to stitch everything together in a solution that builds. (still have to get it to run at runtime, though).

因此,存在解决方案的方法是使用CollectTop聚合要分组的实体的最新行,包括时间窗口的规范.

So, the solution exists in using CollectTop to aggregate the latest rows of the entity you want to group by, including the specification of a Time Window.

下一步是创建javascript UDF以采用该数据结构并将其解析为多维数组.

And the next step was to create the javascript UDF to take that data structure and parse it into a multi-dimensional array.

这是我现在有的查询:

-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
    SELECT  engineid, tmp, hum, eventtime
    FROM    [engine-telemetry] 
    WHERE   engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
    SELECT engineid, 
        CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
    FROM
        [RelevantTelemetry]
    WHERE engineid IS NOT NULL
    GROUP BY TumblingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT engineid, Udf.Predict(Udf.getTimeWindows(TimeWindow)) as Prediction
INTO debug
FROM TimeWindows

这是Javascript UDF:

And this is the Javascript UDF:

    function getTimeWindows(input){
        var output = [];
        for(var x in input){
            var array = [];
            array.push(input[x].value.tmp);
            array.push(input[x].value.hum);
            output.push(array);
        }
        return output;
    }

这篇关于使用按记录分组的最后5条记录的多维数组调用Azure流分析UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆