Python中的Cassandra批量插入 [英] Cassandra Batch Insert in Python

查看:352
本文介绍了Python中的Cassandra批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 Python Cassandra 中进行批量 INSERT $ c>。
我正在使用最新的 Datastax python驱动程序。

I need to do a batch INSERT in Cassandra using Python. I am using the latest Datastax python driver.

INSERTS 是将在同一行中的列的批处理。我将插入很多行,但是数据块将在同一行中。

The INSERTS are batches of columns that will be in the same row. I will have many rows to insert, but chunks of the data will be in the same row.

我可以单独执行 INSERTS for循环中的$ c>:
使用Python Cassandra模块进行参数化查询
我正在使用参数化查询,其值如该示例所示。

I can do individual INSERTS in a for loop as described in this post: Parameterized queries with the Python Cassandra Module I am using parametrized query, values as shown in that example.

这没有帮助:
如何进行多次插入cassandra中的行

我不清楚如何组装参数化的INSERT:

I am not clear how to assemble a parameterized INSERT:

BEGIN BATCH  
  INSERT(query values1)  
  INSERT(query values2)  
  ...  
APPLY BATCH;  
cursor.execute(batch_query)  

这甚至可能吗?这会加快我的 INSERTS 吗?我必须做数百万。甚至成千上万的时间也太长。
我找到了一些Java信息:
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0

Is this even possible? Will this speed up my INSERTS? I have to do millions. Even thousands take too long. I found some Java info: http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0

推荐答案

简介:目前,DataStax Python驱动程序不支持Cassandra 2.0中的CQL协议-它正在开发中,并且beta很快就会上线。届时,您将拥有一个 BATCH 语句,可以根据需要添加绑定的准备好的语句。

Intro: Right now the DataStax Python driver doesn't support the CQL protocol in Cassandra 2.0 -- it's work in progress and betas will should up soon. At that point you'll be able to have a BATCH statement to which you can add bound prepared statements as needed.

考虑到上述情况,您可以使用的解决方案是您所链接的文章中描述的解决方案:准备一个包含 BATCH 和一系列<$ c的语句$ c> INSERT s。此解决方案的明显缺点是,您需要预先确定批处理中有多少个插入,并且还必须相应地拆分输入数据。

Considering the above, the solution you could use is the one described in the post you've linked: prepare a statement that includes a BATCH with a series of INSERTs. The obvious downside of this solution is that you'd need to decide upfront how many inserts will be in your batch and also you'll have to split your input data accordingly.

示例代码:

BATCH_SIZE = 10
INSERT_STMT = 'INSERT INTO T (id, fld1) VALUES (?, ?)';
BATCH_STMT = 'BEGIN BATCH'

for i in range(BATCH_SIZE):
  BATCH_STMT += INSERT_STMT

BATCH_STMT += 'APPLY BATCH;'
prep_batch = session.prepare(BATCH_STMT)

然后,当您收到数据时,您可以进行迭代并为每个 BATCH_SIZE 行将它们绑定到上述 prep_batch 并执行。

Then as you receive data you can iterate over it and for each BATCH_SIZE rows you bind those to the above prep_batch and execute it.

这篇关于Python中的Cassandra批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆