AWS S3存储和架构 [英] AWS S3 storage and schema

查看:88
本文介绍了AWS S3存储和架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个IOT传感器,它将以下消息发送到IoT MQTT Core主题:

I have an IOT sensor which sends the following message to IoT MQTT Core topic:

{"ID1":10001,"ID2":1001,"ID3":101,"ValueMax":123}

我添加了ACT/RULE,它将传入的消息存储在带有时间戳作为键的S3存储桶中(每条消息均作为单独的文件/存储在存储桶中).

I have added ACT/RULE which stores the incoming message in an S3 Bucket with the timestamp as a key(each message is stored as a seperate file/row in the bucket).

我以前只使用过SQL数据库,因此像这样存储它们对我来说是新的.

I have only worked with SQL databases before, so having them stored like this is new to me.

1)这是使用S3存储的正确方法吗?

1) Is this the proper way to work with S3 storage?

2)如何可视化架构中的值而不是单独的文件?

2) How can I visualize the values in a schema instead of separate files?

3)我正在尝试从S3存储桶创建ML数据源,但是当Amazon ML尝试创建架构时,出现以下错误:

3) I am trying to create ML Datasource from the S3 Bucket, but get the error below when Amazon ML tries to create schema:

"Amazon ML无法检索架构.如果您刚刚创建了它,数据源,请稍候,然后重试."

"Amazon ML can't retrieve the schema. If you've just created this datasource, wait a moment and try again."

感谢所有的建议!

推荐答案

1)这是使用S3存储的正确方法吗?

1) Is this the proper way to work with S3 storage?

只有一个传感器,使用[timestamp]( https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html#iot-function-timestamp 函数一种在S3中命名唯一对象的方法,但是可能会出现问题.

With only one sensor, using the [timestamp](https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html#iot-function-timestamp function in your IoT rule would be a way to name unique objects in S3, but there are issues that might come up.

  1. 在一个以上的传感器中,您可能有多个消息同时到达一个时间戳,这不会在S3中生成唯一的对象名称.

  1. With more than one sensor, you might have multiple messages arrive at the same timestamp and this would not generate unique object names in S3.

几乎同一时间的时间戳将具有相似的前缀,以这种方式设计S3密钥可能无法在更高的消息速率下为您提供最佳性能.

Timestamps from nearly the same time are going to have similar prefixes and designing your S3 keys this way may not give you the best performance at higher message rates.

由于使用的是MQTT,因此可以使用

Since you're using MQTT, you could use the traceId function instead of the timestamp to avoid these two issues if they come up.

2)如何可视化架构中的值而不是单独的文件?

2) How can I visualize the values in a schema instead of separate files?

3)我正在尝试从S3存储桶创建ML数据源,但是当Amazon ML尝试创建架构时出现以下错误:

3) I am trying to create ML Datasource from the S3 Bucket, but get the error below when Amazon ML tries to create schema:

对于第三个问题,我认为您可能会遇到

For the third question, I think you could be running into a data format problem in ML because your S3 objects contain the JSON data from your messages and not a CSV.

对于第二个问题,我认为您正在尝试将连续消息中的消息数据合并为CSV,或者至少将消息数据输出为CSV文件的一行.我认为仅使用Iot SQL语言是不可能的,因为它旨在产生JSON.

For the second question, I think you're trying to combine message data from successive messages into a CSV, or at least output the message data as a single line of a CSV file. I don't think this is possible with just the Iot SQL language since it's intended to produce JSON.

一种替代方法是使用Lambda操作配置IoT SQL规则,并使用lambda函数将JSON转换为CSV,然后将CSV写入S3存储桶.如果朝这个方向发展,您可能需要在调用lambda时使用时间戳记(或traceId)来丰富IoT消息数据.

One alternative is to configure your IoT SQL rule with a Lambda action and use a lambda function to make your JSON to CSV conversion and then write the CSV to your S3 bucket. If you go this direction, you may have to enrich your IoT message data with the timestamp (or traceId) as you call the lambda.

诸如 select timestamp()作为时间戳,traceid()作为traceid,concat(ID1,ID2,ID3,ValueMax)作为值,*作为消息的规则之类的规则会生成类似

{时间戳":1538606018066,"traceid":"abab6381-c369-4a08-931d-c08267d12947",值":[10001,1001,101,123],消息":{"ID1":10001,"ID2":1001,"ID3":101,"ValueMax":123}}

将其直接用作CSV行的来源(包含其values属性中的数据).

That would be straightforward to use as the source for a CSV row with the data from its values property.

这篇关于AWS S3存储和架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆