Cassandra时间数据模型 [英] Cassandra timeseries datamodel

查看:275
本文介绍了Cassandra时间数据模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有10个设备(dev01,dev02,dev03..etc)。

它以一定的时间间隔发送数据,我们收集这些数据,

  dev01:int 
signalname:string
signaltime:date / time [with YY-MM- DD HHMMSS.mm]
Extradata:String

我想将数据推送到cassandra最好存储这些数据吗?



我的查询是,



1需要检索基于设备的电流日数据或某个日期范围?



2 5设备当前日期数据?



以下将数据存储到cassadra中的方式是最好的模型

 标准columnfamily名称:signalname 
行键:dev01
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data
columnname:timeseries(20120801124205)[YYMMDD HHMMSS] [next second data]
columnvalue:Json data

row key:dev02
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data
columnname:timeseries(20120801124205)[YYMMDD HHMMSS] [next second data]
columnvalue:Json data



超级列系列:信号名
行键:Clientid1

超列名:dev01
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data

supercolumnname:dev02
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data


row key:Clientid2

supercolumnname:dev03
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue: Json数据

超列名:dev04
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data

请帮助我解决这个问题,
任何其他方式?



谢谢& b $ b Kannadhasan

解决方案

我在这里看到3个问题,我将在下面说明:




  • 超级列族,

  • thrift与cql3,




在开始之前:不建议使用超级列族在此处了解详情



此外,您可能需要阅读CQL3 ,因为 thrift是一个遗留API



您可以使用本地集合数据类型,如列表和地图等。如果您仍想使用JSON,请使用



一般来说,在每个设备和每个时间段查询是非常简单的:




  • 您的行键将是设备ID和列键a timeuuid

  • 为避免热点,添加bucket计数器到行键(创建复合行/分区键)以旋转节点

  • 然后,如果知道行/设备ID,您可以查询时间范围。



或者,如果要查询数据,您可以使用信号类型作为行键(和timeuuid / timestamp作为列键)为多个设备(但一个事件类型)。有关详情,请参阅这篇文章中的cassandra中的时间序列数据博客条目



希望有所帮助!


Let assume 10 devices(dev01,dev02,dev03..etc).

It send data with some interval time,we collect those data,so our data schema is

 dev01      :int
 signalname :string
 signaltime :date/time[with YY-MM-DD HHMMSS.mm]
 Extradata  :String

I want to push data into cassandra ,which way is best to store those data?

My Query is Like ,

1 Need to retrive device based current day data,or with some date range?

2 5 Device current day data?

I am not sure the following way to store data into cassadra is best model

Standard columnfamily Name:signalname
row key                   :dev01
columnname                :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue               :Json data
columnname                :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue               :Json data

row key               :dev02
columnname            :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue           :Json data
columnname            :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue           :Json data

Or  

Super columnfamily   :signalname
row key              :Clientid1

supercolumnname      :dev01
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

supercolumnname      :dev02
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data


row key              :Clientid2

supercolumnname      :dev03
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

supercolumnname      :dev04
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

kindly help me out regarding this issue, Any other Way?

Thanks&Regards, Kannadhasan

解决方案

I see 3 issues with your approach here which I will address below:

  • super column families,
  • thrift vs cql3,
  • json data as cell values.

Before you go ahead: the use super column families is discouraged. Read more here. Composite keys (as described below) are the way to go.

Also, you might need to read up on CQL3, since thrift is a legacy API since 1.2.

Instead of storing json data, you may make use of native collection data types like lists, and maps etc. If you still want to work with JSON, there is improved JSON support in in Cassandra since version 2.2.

In general, it is pretty straightforward to query per device and per timeperiod:

  • you row key would be the device id and the column key a timeuuid
  • To avoid hot spots, you could add "bucket" counters to the row key (create a composite row/partition key) to rotate the nodes
  • You can then query for time ranges if you know the row/device id.

Alternatively you could use your signal type as a row key (and timeuuid/timestamp as a column key) if you want to query data for multiple devices (but one event type) at once. Read more on timeseries data in cassandra in this blog entry.

Hope that helps!

这篇关于Cassandra时间数据模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆