使用 TTL 将数据从一个 Cassandra 表复制到另一个表 [英] Copying data from one Cassandra table to another with TTL

查看:39
本文介绍了使用 TTL 将数据从一个 Cassandra 表复制到另一个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们通过从分区键中删除一列来更改其中一张表的分区键.该表中的每条记录也有 TTL.现在我们想用 TTL 保留该表中的数据.我们该怎么做?

We are changing partition key of one of our table by removing one column from partition key. Every record in this table also has TTL. Now we want to preserve the data in that table with TTL. How can we do it?

我们可以创建具有所需架构的新表,然后将数据从旧表复制到新表.然而,我们在这个过程中失去了 TTL.

We can create new table with desired schema and then copy data from old table to new table. However, we loose TTL in this process.

欲知更多信息 - 此 Cassandra 表由 Apache Storm 应用程序填充,该应用程序从 Kafka 读取事件.我们可以重新水合 Kafka 消息,但 Kafka 有一些我们不想处理的不需要的消息.

For further information - This Cassandra table is populated by an Apache Storm application which reads events from Kafka. We can re-hydrate Kafka messages but Kafka has some unwanted messages which we don't want to process.

注意 - TTL 是根据日期列值决定的,该值永远不会改变.因此,所有列上的 TTL 始终相同.

NOTE - TTL is decided based on date column value, which never changes. Because of this TTL would always be same on all the columns.

推荐答案

在进入具体实现之前,重要的是要了解 TTL 可能存在于单个单元格上,也可能存在于行中的所有单元格上.并且当您执行 INSERT 或 UPDATE 操作时,您只能为查询中指定的所有列应用一个 TTL 值,因此如果您有 2 个具有不同 TTL 的列,那么您将需要执行 2 个查询 - 对于每个列列,具有不同的 TTL.

Before going to specific implementation, it's important to understand that TTL may exist on the individual cell as well as all cells in the row. And when you're performing INSERT or UPDATE operation, you can apply only one TTL value for all columns that are specified in the query, so if you have 2 columns with different TTLs, then you'll need to perform 2 queries - for each column, with different TTLs.

关于工具 - 这里有 2 个或多或少的即用型选项:

Regarding the tooling - there are 2 more or less ready-to-use options here:

  • 使用 DSBulk.这种方法在这篇博文.基本上,您需要使用将提取列值的查询将数据卸载到磁盘.为它们设置 TTL,然后通过为每个具有单独 TTL 的列生成批次来加载数据.来自示例:
  • Use DSBulk. This approach is described in details in the example 30.1 of this blog post. Basically, you need to unload data to disk using the query that will extract column values & TTLs for them, and then load data by generating batches for every column that have separate TTL. From example:
dsbulk unload -h localhost -query \
  "SELECT id, petal_length, WRITETIME(petal_length) AS w_petal_length, TTL(petal_length) AS l_petal_length, .... FROM dsbulkblog.iris_with_id" \
  -url /tmp/dsbulkblog/migrate
dsbulk load -h localhost -query \
  "BEGIN BATCH INSERT INTO dsbulkblog.iris_with_id(id, petal_length) VALUES (:id, :petal_length) USING TIMESTAMP :w_petal_length AND TTL :l_petal_length; ... APPLY BATCH;" \
  -url /tmp/dsbulkblog/migrate --batch.mode DISABLED

  • 使用 Spark Cassandra 连接器 - 它支持阅读 &使用 TTL & 写入数据写入时间.但是您需要开发执行此操作的代码,并正确处理诸如集合、静态列等内容(或等待 SPARKC-596 已实施)
    • Use Spark Cassandra Connector - it supports reading & writing the data with TTL & WriteTime. But you'll need to develop the code that is doing it, and correctly handle things such as collections, static columns etc. (or wait for SPARKC-596 implemented)
    • 这篇关于使用 TTL 将数据从一个 Cassandra 表复制到另一个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆