Cassandra时间数据模型 [英] Cassandra timeseries datamodel
问题描述
假设有10个设备(dev01,dev02,dev03..etc)。
它以一定的时间间隔发送数据,我们收集这些数据,
dev01:int
signalname:string
signaltime:date / time [with YY-MM- DD HHMMSS.mm]
Extradata:String
我想将数据推送到cassandra最好存储这些数据吗?
我的查询是,
1需要检索基于设备的电流日数据或某个日期范围?
2 5设备当前日期数据?
以下将数据存储到cassadra中的方式是最好的模型
标准columnfamily名称:signalname
行键:dev01
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data
columnname:timeseries(20120801124205)[YYMMDD HHMMSS] [next second data]
columnvalue:Json data
row key:dev02
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data
columnname:timeseries(20120801124205)[YYMMDD HHMMSS] [next second data]
columnvalue:Json data
或
超级列系列:信号名
行键:Clientid1
超列名:dev01
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data
supercolumnname:dev02
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
row key:Clientid2
supercolumnname:dev03
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue: Json数据
超列名:dev04
columnname:timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue:Json data
请帮助我解决这个问题,
任何其他方式?
谢谢& b $ b Kannadhasan
我在这里看到3个问题,我将在下面说明:
- 超级列族,
- thrift与cql3,
- 。
在开始之前:不建议使用超级列族。 在此处了解详情。
此外,您可能需要阅读CQL3 ,因为 thrift是一个遗留API 。
您可以使用本地集合数据类型,如列表和地图等。如果您仍想使用JSON,请使用
一般来说,在每个设备和每个时间段查询是非常简单的:
- 您的行键将是设备ID和列键a timeuuid
- 为避免热点,添加bucket计数器到行键(创建复合行/分区键)以旋转节点
- 然后,如果知道行/设备ID,您可以查询时间范围。
或者,如果要查询数据,您可以使用信号类型作为行键(和timeuuid / timestamp作为列键)为多个设备(但一个事件类型)。有关详情,请参阅这篇文章中的cassandra中的时间序列数据博客条目。
希望有所帮助!
Let assume 10 devices(dev01,dev02,dev03..etc).
It send data with some interval time,we collect those data,so our data schema is
dev01 :int
signalname :string
signaltime :date/time[with YY-MM-DD HHMMSS.mm]
Extradata :String
I want to push data into cassandra ,which way is best to store those data?
My Query is Like ,
1 Need to retrive device based current day data,or with some date range?
2 5 Device current day data?
I am not sure the following way to store data into cassadra is best model
Standard columnfamily Name:signalname
row key :dev01
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
columnname :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue :Json data
row key :dev02
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
columnname :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue :Json data
Or
Super columnfamily :signalname
row key :Clientid1
supercolumnname :dev01
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
supercolumnname :dev02
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
row key :Clientid2
supercolumnname :dev03
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
supercolumnname :dev04
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
kindly help me out regarding this issue, Any other Way?
Thanks&Regards, Kannadhasan
I see 3 issues with your approach here which I will address below:
- super column families,
- thrift vs cql3,
- json data as cell values.
Before you go ahead: the use super column families is discouraged. Read more here. Composite keys (as described below) are the way to go.
Also, you might need to read up on CQL3, since thrift is a legacy API since 1.2.
Instead of storing json data, you may make use of native collection data types like lists, and maps etc. If you still want to work with JSON, there is improved JSON support in in Cassandra since version 2.2.
In general, it is pretty straightforward to query per device and per timeperiod:
- you row key would be the device id and the column key a timeuuid
- To avoid hot spots, you could add "bucket" counters to the row key (create a composite row/partition key) to rotate the nodes
- You can then query for time ranges if you know the row/device id.
Alternatively you could use your signal type as a row key (and timeuuid/timestamp as a column key) if you want to query data for multiple devices (but one event type) at once. Read more on timeseries data in cassandra in this blog entry.
Hope that helps!
这篇关于Cassandra时间数据模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!