Cassandra:带有时间戳和大数据集的表设计 [英] Cassandra: Table design with timestamp and large dataset

查看:31
本文介绍了Cassandra:带有时间戳和大数据集的表设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一天之内查询大量数据时遇到问题.我正在寻求有关创建高效表架构的建议.

I am having issues querying large volumes of data by a single day. I am looking for advice on creating an efficient table schema.

表:事件日志

列:recordid(UUID)、inserttimestamp(时间戳)、源(Text)、事件(Text)

Columns: recordid (UUID), insertedtimestamp (timestamp), source (Text), event (Text)

如果我只是这样做:

CREATE TABLE eventlog (
    recordid uuid PRIMARY KEY,
    insertedtimestamp timestamp,
    source text,
    event text
); 

那么下面的查询将被数据量淹没,假设今天是 1/25.

Then the below query will get overwhelmed by the volume of data, assuming today is 1/25.

select * from eventlog where insertedtimestamp > '2017-01-25';

目标是从一天中选择所有记录,知道我们需要有效地使用可能有数百万条记录的表进行分区.我将如何设计一个高效的表模式(设置什么分区键)?谢谢.

The goal is to select all the records from a single day, knowing we need to be efficient in partitioning using tables with possibly millions of records. How would I design an efficient table schema (What partition key setup)? Thank you.

推荐答案

虽然你想在一天内得到所有的记录,但你可以使用这个模式

Though you want to get all the record in a single day, you can use this schema

CREATE TABLE eventlog (
    day int,
    month int,
    year int,
    recordid uuid,
    insertedtimestamp timestamp,
    source text,
    event text,
    PRIMARY KEY((day,month,year),recordid)
); 

所以一天内的所有数据都将在一个节点中.现在您可以使用以下查询更有效地获取日期的数据,例如 2017-01-25

So all of the data in a single day, will be in a single node. Now you can get data of a date say 2017-01-25 more efficiently with the below query

SELECT* FROM eventlog WHERE day = 25 and month = 1 and year = 2017 

这篇关于Cassandra:带有时间戳和大数据集的表设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆