为现有数据集的Cassandra中的所有记录自动生成UUID [英] generate UUID automatically for all records in Cassandra for an existing dataset

查看:203
本文介绍了为现有数据集的Cassandra中的所有记录自动生成UUID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个现有的数据集,其中包含大约700000个CSV格式的记录。我已经将该数据文件导入到Apache Cassandra表中。问题是

I have an existing dataset with around 700000 records in a CSV format. I have imported that data file into apache Cassandra table. The problem is

主键。如何为所有记录自动将uuid生成(upsert)到我的主键列中?
我正在使用Cassandra 3.10。

primary key. How can I automatically generate (upsert) uuid into my primary key column for all of my records? I am using Cassandra 3.10.

推荐答案

不幸的是,如果您使用的是 COPY 命令,您实际上没有任何选项可以为行动态生成 UUID 。我认为您确实有两个选择,这两个选择都涉及以编程方式在某种程度上做事:

Unfortunately, if you're using the COPY command you don't really have any options for generating UUIDs on the fly for your rows. I think you really have two options, both of which involve doing things programmatically to one extent or another:


  1. 对您的计算机进行一些预处理CSV文件以生成并在每行中添加 UUID ,写出一个具有该附加字段和 UUID 值的新文件每行。逐行处理文件并使用小型Python脚本或类似工具生成这些值应该非常简单。然后,您可以像以前一样使用 COPY 命令将数据导入到Cassandra中。

  2. 由于您已经要编写一些内容了,代码,完全跳过 COPY 命令,仅用Python(或Java或您选择的语言)编写代码以读取文件,将每条CSV行解析为值,生成该行的UUID,然后使用适合您所使用编程语言的驱动程序将数据 INSERT 导入Cassandra。

  1. Do some pre-processing on your CSV file to generate and add a UUID to each row, writing out a new file with that additional field and UUID value for each row. It should be pretty straightforward to process the file, line by line, and generate those values using a small Python script or something similar. Then you can use the COPY command like before to import the data into Cassandra.
  2. Since you're already going to be writing some code, skip using the COPY command altogether and just write the code in Python (or Java or your language of choice) to read the file, parse each CSV line into values, generate a UUID for that row, and then INSERT the data into Cassandra using the appropriate driver for the programming language you're using.

如果您决定使用选项2,则会找到 Cassandra的DataStax驱动程序即将在本页底部,以及有关如何使用它们的文档。希望有帮助!

If you decide to go with option 2, you'll find a list of the DataStax drivers for Cassandra towards the bottom of this page, along with documentation for how to use them. Hope that helps!

这篇关于为现有数据集的Cassandra中的所有记录自动生成UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆