如何在Cassandra中获取最后插入的行? [英] How to get last inserted row in Cassandra?

查看:29
本文介绍了如何在Cassandra中获取最后插入的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 Cassandra 表中获取最后插入的行.如何获得?任何的想法?

I want to get last inserted row in Cassandra table. How to get it? Any idea?

我正在开发一个项目,我将用 cassandra 替换 mysql.我想摆脱所有的 sql 查询并将它们全部写在 cassandra 中.

I am developing a project for that I am replacing mysql with cassandra. I want to get rid off all sql queries and writing them all in cassandra.

推荐答案

只是为了传递一点理解...

Just to impart a little understanding...

与所有 Cassandra 查询问题一样,查询需要由专门为其设计的模型提供服务.这称为基于查询的建模.查询最后插入的行不是每个表内置的内在功能.您需要提前设计模型以支持该功能.

As with all Cassandra query problems, the query needs to be served by model specifically designed for it. This is known as query-based modeling. Querying the last inserted row is not an intrinsic capability built into every table. You would need to design your model to support that ahead of time.

例如,假设我有一个为用户存储数据的表.

For instance, let's say I have a table storing data for users.

CREATE TABLE users (
  username TEXT,
  email TEXT,
  firstname TEXT,
  lastname TEXT,
  PRIMARY KEY (username));

如果我要在这个表上运行 SELECT * FROM users LIMIT 1,我的结果集将包含一行.该行将是包含 username(我的分区键)的最低散列值的行,因为这就是 Cassandra 在集群中存储数据的方式.我无法知道它是否是最后一个添加的,所以这对你来说不是非常有用.

If I were to run a SELECT * FROM users LIMIT 1 on this table, my result set would contain a single row. That row would be the one containing the lowest hashed value of username (my partition key), because that's how Cassandra stores data in the cluster. I would have no way of knowing if it was the last one added or not, so this wouldn't be terribly useful to you.

另一方面,假设我有一个表格,用于跟踪用户对其帐户信息所做的更新.

On the other hand, let's say I had a table designed to track updates that users had made to their account info.

CREATE TABLE userUpdates (
  username TEXT,
  lastUpdated TIMEUUID,
  email TEXT,
  firstname TEXT,
  lastname TEXT,
  PRIMARY KEY (username,lastUpdated))
WITH CLUSTERING ORDER BY (lastUpdated DESC);

接下来我将插入 3 行:

Next I'll upsert 3 rows:

> INSERT INTO userUpdates (username,lastUpdated,email,firstname,lastname) 
  VALUES ('bkerman',now(),'bkerman@ksp.com','Bob','Kerman');
> INSERT INTO userUpdates (username,lastUpdated,email,firstname,lastname) 
  VALUES ('jkerman',now(),'jkerman@ksp.com','Jebediah','Kerman');
> INSERT INTO userUpdates (username,lastUpdated,email,firstname,lastname) 
  VALUES ('bkerman',now(),'bobkerman@ksp.com','Bob','Kerman');

> SELECT username, email, dateof(lastUpdated) FROM userupdates;

 username | email             | system.dateof(lastupdated)
----------+-------------------+----------------------------
  jkerman |   jkerman@ksp.com |   2016-02-17 15:31:39+0000
  bkerman | bobkerman@ksp.com |   2016-02-17 15:32:22+0000
  bkerman |   bkerman@ksp.com |   2016-02-17 15:31:38+0000

(3 rows)

如果我只是 SELECT username, email, dateof(lastUpdated) FROM userupdates LIMIT 1 我会得到 Jedediah Kerman 的数据,这不是最近更新的.但是,如果我将分区限制为 username='bkerman',使用 LIMIT 1,我将获得 Bob Kerman 的最新行.

If I just SELECT username, email, dateof(lastUpdated) FROM userupdates LIMIT 1 I'll get Jedediah Kerman's data, which is not the most-recently updated. However, if I limit my partition to username='bkerman', with a LIMIT 1 I will get the most-recent row for Bob Kerman.

> SELECT username, email, dateof(lastUpdated) FROM userupdates WHERE username='bkerman' LIMIT 1;

 username | email             | system.dateof(lastupdated)
----------+-------------------+----------------------------
  bkerman | bobkerman@ksp.com |   2016-02-17 15:32:22+0000

(1 rows)

这是可行的,因为我在 lastUpdated 上指定了 降序 的聚类顺序:

This works, because I specified a clustering order of descending on lastUpdated:

WITH CLUSTERING ORDER BY (lastUpdated DESC);

这样,每个分区内的结果都会以最近更新的行在顶部返回,因此LIMIT 1成为查询最近行的方式.

In this way, results within each partition will be returned with the most-recently upserted row at the top, hence LIMIT 1 becomes the way to query the most-recent row.

总而言之,重要的是要了解:

In summary, it is important to understand that:

  • Cassandra 通过分区键的散列值对集群中的数据进行排序.这有助于确保更均匀的数据分布.
  • Cassandra CLUSTERING ORDER 强制对分区键的数据进行磁盘排序.
  • 虽然您无法为每个表获取最近插入的行,但您可以设计模型以将每个分区的行返回给您.
  • Cassandra orders data in the cluster by the hashed value of a partition key. This helps ensure more-even data distribution.
  • Cassandra CLUSTERING ORDER enforces on-disk sort order of data within a partition key.
  • While you won't be able to get the most-recently upserted row for each table, you can design models to return that row to you for each partition.

tl;dr; Cassandra 中的查询与 MySQL 或任何 RDBMS 中的查询大不相同.如果您需要查询最后插入的行(对于分区),则可能有多种方法可以为您的表建模以支持它.

tl;dr; Querying in Cassandra is MUCH different from that of MySQL or any RDBMS. If querying the last upserted row (for a partition) is something you need to do, there are probably ways in which you can model your table to support it.

这篇关于如何在Cassandra中获取最后插入的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆