Cassandra 列键自动递增 [英] Cassandra column key auto increment

查看:26
本文介绍了Cassandra 列键自动递增的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解 Cassandra 以及如何构建我的列族 (CF),但是因为我已经习惯了关系数据库,所以这很困难.

I am trying to understand Cassandra and how to structure my column families (CF) but it's quite hard since I am used to relational databases.

例如,如果我创建简单的 users CF 并尝试插入新行,我如何像在 MySQL 中一样创建增量键?

For example if I create simple users CF and I try to insert new row, how can I make an incremental key like in MySQL?

我看到很多例子,你只需要输入用户名而不是唯一 ID,这有点道理,但如果我希望用户使用重复的用户名怎么办?

I saw a lot of examples where you would just put the username instead of unique ID and that would make a little sense, but what if I want users to have duplicated usernames?

此外,当根据我的理解 cassandra 不支持 > 运算符时,我如何进行搜索,因此类似于 select * from users where something >something2 不起作用.

Also how can I make searches when from what I understand cassandra does not suport > operators, so something like select * from users where something > something2 would not work.

也许最重要的问题是分组呢?我是否需要检索所有数据,然后使用我使用的任何语言对其进行过滤?我认为这会大大降低我的系统速度.

And probably the most important question what about grouping? Would I need to retrieve all data and then filter it with whatever language I am using? I think that would slow down my system a lot.

所以基本上我需要一些简要说明如何开始使用 Cassanda.

So basically I need some brief explanation how to get started with Cassanda.

推荐答案

你的问题很笼统,但让我试一试.首先,您需要根据查询对数据进行建模.使用 RDBMS,您可以以某种规范化的形式对数据进行建模,然后稍后针对您的特定查询进行优化.你不能用 Cassandra 做到这一点;您必须按照您打算阅读的方式编写数据.通常这意味着以不止一种方式编写它.一般来说,如果您想与 Cassandra 有效合作,完全摆脱 RDBMS 的想法会很有帮助.

Your questions are quite general, but let me take a stab at it. First, you need to model your data in terms of your queries. With an RDBMS, you model your data in some normalized form, then optimize later for your specific queries. You cannot do this with Cassandra; you must write your data the way you intend to read it. Often this means writing it more than one way. In general, it helps to completely shed your RDBMS thinking if you want to work effectively with Cassandra.

关于键:

  • 它们在 Cassandra 中用作整个环的分布单位.因此,您的密钥将被散列并在环中分配一个所有者".使用 RandomPartitioner 保证均匀分布

  • They are used in Cassandra as the unit of distribution across the ring. So your key will get hashed and assigned an "owner" in the ring. Use the RandomPartitioner to guarantee even distribution

假设您使用 RandomPartitioner(您应该使用),键不会被排序.这意味着您不能要求一系列密钥.但是,您可以在单个查询中请求键列表.

Presuming you use RandomPartitioner (you should), keys are not sorted. This means you cannot ask for a range of keys. You can, however, ask for a list of keys in a single query.

键在某些模型中是相关的,而在其他模型中则不相关.如果您的模型需要按键查询,您可以使用您的应用程序知道的任何唯一值(例如 UUID).有时键是标记值,例如代表一天开始的 Unix 纪元.这允许您将一堆已知键交给 Cassandra,然后获得按列排序的一系列数据(见下文).

Keys are relevant in some models and not in others. If your model requires query-by-key, you can use any unique value that your application is aware of (such as a UUID). Sometimes keys are sentinel values, such as a Unix epoch representing the start of the day. This allows you to hand Cassandra a bunch of known keys, then get a range of data sorted by column (see below).

关于查询谓词:

  • 假设您正确建模以回答您的查询,您可以获得一系列数据.

  • You can get ranges of data presuming you model it correctly to answer your queries.

由于列是按排序顺序写入的,因此您可以使用切片查询(非常快)来查询从 A 列到 n 列的范围.您还可以使用复合列来稍微抽象此机制.

Since columns are written in sorted order, you can query a range from column A to column n with a slice query (which is very fast). You can also use composite columns to abstract this mechanism a bit.

您可以在基数较低的列上使用二级索引——这为您提供了按值查询的功能.

You can use secondary indexes on columns where you have low cardinality--this gives you query-by-value functionality.

您可以创建自己的索引,其中数据按您需要的方式排序.

You can create your own indexes where the data is sorted the way you need it.

关于分组:

我认为您指的是创建聚合.如果您需要实时数据,则需要使用一些外部机制(例如 Storm)跟踪数据并不断将您的相关聚合更新为 CF.如果您要在批处理过程中创建聚合,Cassandra 可以与 Hadoop 完美集成,让您可以使用 Pig、Hive 或直接使用您选择的语言编写 map/reduce 作业.

I presume you're referring to creating aggregates. If you need your data in real-time, you'll want to use some external mechanism (like Storm) to track data and constantly update your relevant aggregates into a CF. If you are creating aggregates as part of a batch process, Cassandra has excellent integration with Hadoop, allowing you to write map/reduce jobs in Pig, Hive, or directly in your language of choice.

这篇关于Cassandra 列键自动递增的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆