如何将Pandas DataFrame插入Cassandra? [英] How to insert Pandas DataFrame into Cassandra?

查看:181
本文介绍了如何将Pandas DataFrame插入Cassandra?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框:

df

date        time       open   high   low   last
01-01-2017  11:00:00   37      45     36    42
01-01-2017  11:23:00   36      43     33    38
01-01-2017  12:00:00   45      55     35    43

....

我想将其写入cassandra.这是在python中处理数据后的批量上传.

I want to write it into cassandra. It's kind of bulk upload after processing on data in python.

cassandra的架构如下:

The schema for cassandra is as below:

CREATE TABLE ks.table1(date text, time text, open float, high float, low 
                       float, last float, PRIMARY KEY(date, time))

要将单行插入cassandra中,我们可以在python中使用cassandra-driver,但我找不到有关上传整个数据框的任何详细信息.

To insert single row into cassandra we can use cassandra-driver in python but I couldn't find any details about uploading an entire dataframe.

from cassandra.cluster import Cluster

session.execute(
    """
    INSERT INTO ks.table1 (date,time,open,high,low,last)
    VALUES (01-01-2017, 11:00:00, 37, 45, 36, 42)
    """)

PS:类似的问题早已被问到,但没有得到.无法回答我的问题.

P.S: The similar question have been asked earlier, but doesn't have answer to my question.

推荐答案

即使我也遇到了这个问题,但我发现即使将数百万行(准确地说是1900万)上传到Cassandra中也不需要花费很多时间

Even i was facing this problem but i figured out that even while uploading Millions of rows(19 Million to be exact) into Cassandra its didn't take much time.

遇到问题,您可以使用 cassandra批量加载程序 完成您的工作.

Coming to your problem,you can use cassandra Bulk LOADER to get your job done.

您可以使用准备好的语句在遍历dataFrame的同时帮助将数据升级到cassandra表中.

You can use prepared statements to help uplaod data into cassandra table while iterating through the dataFrame.

    from cassandra.cluster import Cluster
    cluster = Cluster(ip_address)
    session = cluster.connect(keyspace_name)
    query = "INSERT INTO data(date,time,open,high,low,last) VALUES (?,?,?,?,?,?)"
    prepared = session.prepare(query)

"?用于输入变量

    for item in dataFrame:
        session.execute(prepared, (item.date_value,item.time_value,item.open_value,item.high_value,item.low_value,item.last_value))

    for item in dataFrame:
        session.execute(prepared, (item[0],item[1],item[2],item[3],item[4],item[5]))

我的意思是使用for循环提取数据并使用session.execute()上传.

What i mean is that use for loop to extract data and upload using session.execute().

有关准备的语句

希望这会有所帮助.

这篇关于如何将Pandas DataFrame插入Cassandra?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆