消费/制造亚马逊室壁运动数据,特别是shardID [英] Consuming/producing data to particular shardID in amazon Kinesis

查看:160
本文介绍了消费/制造亚马逊室壁运动数据,特别是shardID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要把所有记录成从各种服务器室壁运动和需要输出数据分成多个S3文件。我一直在试图与ShardID,但是,不能够让它工作了。

I need to put the all the records into kinesis from various servers and need to output the data into multiple S3 Files. I have been trying with ShardID, but, not able to make it work out.

您能帮????

的Python / Java的就可以了。

Python/Java would be fine.

推荐答案

ShardID并不重要。

ShardID is not that important.

  • 如果你有20 MB /秒的输入带宽20000请求/秒的速度;你应该有20个碎片最少。

和与每个碎片,您的信息就会为s $ P $垫防空火炮,所以它只是能力。这些碎片不会影响你的输入和输出结果。 (它还会影响并行与哈希的帮助 - 分区 - 键但那是另一回事,我没有解释,不要混淆)

And with each shard, your data will be spread accross, so it is just about capacity. Those shards does not affect your input and output result. (It also affects parallelization with the help of hash - partition - key but that's another thing, I'm not explaining that not to confuse.)

您应该关心put_record或put_records的制片方法的一部分(即输入);和记录发射(即输出)的消费方。你不应该担心这些碎片有过通过记录,你只需要在客户端和流程与业务需求的纪录。

You should be concerned about "put_record" or "put_records" methods in the producer (ie. input) part; and the record emitted (ie. output) on the consumer side. You should not worry about which shard has the record passed through, you just take the record on the consumer side and process with your business needs.

使用的Kinesis客户端库( https://github.com/awslabs/amazon-kinesis-client )是最适合这种抽象。

Using Kinesis Client Library ( https://github.com/awslabs/amazon-kinesis-client ) is the best for this abstraction.

还有GitHub上亚马逊的Kinesis连接器的示例项目( https://github.com/ awslabs /亚马逊室壁运动型连接),做消费数据,并上传入S3。

There is also a sample project on GitHub Amazon Kinesis Connectors ( https://github.com/awslabs/amazon-kinesis-connectors ) that does consuming data and uploading it into S3.

这篇关于消费/制造亚马逊室壁运动数据,特别是shardID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆