Cassandra列键自动递增 [英] Cassandra column key auto increment

查看:140
本文介绍了Cassandra列键自动递增的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解Cassandra以及如何构建我的列族(CF),但是很难,因为我习惯于关系数据库...



例如,如果我创建简单的 users CF,并尝试插入新行,我如何在MySQL中创建一个增量键?



我看到很多例子,你只是把用户名而不是唯一的ID,这将有一点意义,但如果我想让用户有重复的用户名?



此外,当我从Cassandra不支持> 运算符时,我如何进行搜索,因此类似 select * from someone where something> something2 将无法工作。



也许最重要的问题是什么分组?我需要检索所有的数据,然后用任何我使用的语言过滤它吗?我认为这会减慢我的系统很多。



所以基本上我需要一些简短的解释如何开始使用Cassanda。

解决方案

你的问题是相当一般的,但让我抓住它。首先,您需要根据查询对数据进行建模。使用RDBMS,您可以以某种规范化形式对数据建模,然后针对特定查询进行优化。你不能用Cassandra做到这一点;你必须按照你打算读它的方式写你的数据。通常这意味着写多于一种方式。



关于键:



如果您想要使用Cassandra进行有效的工作,

  • 它们在Cassandra中用作跨环的分发单位。所以你的密钥将被哈希,并在环中分配一个所有者。使用RandomPartitioner保证均匀分布


  • 假设你使用RandomPartitioner(你应该),键不排序。这意味着你不能要求一个键的范围。


  • 键在某些模型中是相关的,而在其他模型中是不相关的。如果您的模型需要按关键字查询,则可以使用应用程序知道的任何唯一值(例如UUID)。有时,键是哨兵值,例如表示一天开始的Unix纪元。




这是一个非常简单的方法,

关于查询谓词:




  • 假设您正确建模, p>


  • 由于列是按排序顺序编写的,因此您可以使用切片查询(非常快)查询A列到n列的范围。您也可以使用复合列来抽象此机制。


  • 您可以在具有低基数的列上使用辅助索引 - 这可以提供按值查询功能。


  • 您可以创建自己的索引,根据需要对数据进行排序。




关于分组:



我认为你指的是创建聚合。如果您需要实时的数据,则需要使用一些外部机制(如 Storm )来跟踪数据并持续将相关聚合更新为CF.如果您正在创建聚合作为批处理过程的一部分,Cassandra与Hadoop具有良好的集成,允许您在Pig,Hive中直接使用您的语言编写map / reduce作业。


I am trying to understand Cassandra and how to structure my column families (CF), but it's quite hard since I am used to relational databases...

For example if I create simple users CF and I try to insert new row, how can I make an incremental key like in MySQL?

I saw a lot of examples where you would just put the username instead of unique ID and that would make a little sense, but what if I want users to have duplicated usernames?

Also how can I make searches when from what I understand cassandra does not suport > operators, so something like select * from users where something > something2 would not work.

And probably the most important question what about grouping? would I need to retrieve all data and then filter it with whatever language I am using? I think that would slow down my system a lot.

So basically I need some brief explanation how to get started with Cassanda.

解决方案

Your questions are quite general, but let me take a stab at it. First, you need to model your data in terms of your queries. With an RDBMS, you model your data in some normalized form, then optimize later for your specific queries. You cannot do this with Cassandra; you must write your data the way you intend to read it. Often this means writing it more than one way. In general, it helps to completely shed your RDBMS thinking if you want to work effectively with Cassandra.

Regarding keys:

  • They are used in Cassandra as the unit of distribution across the ring. So your key will get hashed and assigned an "owner" in the ring. Use the RandomPartitioner to guarantee even distribution

  • Presuming you use RandomPartitioner (you should), keys are not sorted. This means you cannot ask for a range of keys. You can, however, ask for a list of keys in a single query.

  • Keys are relevant in some models and not in others. If your model requires query-by-key, you can use any unique value that your application is aware of (such as a UUID). Sometimes keys are sentinel values, such as a Unix epoch representing the start of the day. This allows you to hand Cassandra a bunch of known keys, then get a range of data sorted by column (see below).

Regarding query predicates:

  • You can get ranges of data presuming you model it correctly to answer your queries.

  • Since columns are written in sorted order, you can query a range from column A to column n with a slice query (which is very fast). You can also use composite columns to abstract this mechanism a bit.

  • You can use secondary indexes on columns where you have low cardinality--this gives you query-by-value functionality.

  • You can create your own indexes where the data is sorted the way you need it.

Regarding grouping:

I presume you're referring to creating aggregates. If you need your data in real-time, you'll want to use some external mechanism (like Storm) to track data and constantly update your relevant aggregates into a CF. If you are creating aggregates as part of a batch process, Cassandra has excellent integration with Hadoop, allowing you to write map/reduce jobs in Pig, Hive, or directly in your language of choice.

这篇关于Cassandra列键自动递增的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆