使用Apache Spark为Cassandra插件创建Timeuuid [英] Creating timeuuid for cassandra inserts with apache spark

查看:90
本文介绍了使用Apache Spark为Cassandra插件创建Timeuuid的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用apache spark和apache cassandra进行数据分析,并且正在努力地将timeuuid字段插入cassandra中.

I am toying with apache spark and apache cassandra for data analytics and i am struggling with inserting back into cassandra with timeuuid fields.

我有下表

CREATE TABLE leech_seed_report.daily_sessions (
    id timeuuid PRIMARY KEY,
    app int,
    count int,
    date bigint,
    offline boolean,
    vendor text,
    version text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX daily_sessions_app_idx ON leech_seed_report.daily_sessions (app);
CREATE INDEX daily_sessions_date_idx ON leech_seed_report.daily_sessions (date);
CREATE INDEX daily_sessions_offline_idx ON leech_seed_report.daily_sessions (offline);
CREATE INDEX daily_sessions_vendor_idx ON leech_seed_report.daily_sessions (vendor);
CREATE INDEX daily_sessions_version_idx ON leech_seed_report.daily_sessions (version);

我正在使用

rows.saveToCassandra("leech_seed_report", "daily_sessions", SomeColumns("id", "date", "app", "vendor", "version", "offline", "count"))

我的行由以下格式的元组组成

and my rows consist of tuples of the format

([timmuuid_will_be_here], BigInt, Int, String, String, Boolean, Int)

我一直尝试在没有timeuuid字段的情况下插入同一张表,并且一切正常,但是我一生都无法解决如何为每一行创建一个timeuuid

i have played around with inserting into the same table without the timeuuid field and it all works fine but i cant for the life of me work out how to create a timeuuid for each row

任何帮助将不胜感激,我是火花,cassandra和scala的新手,感觉就像是将我的头撞在砖墙上

Any help would be greatly appreciated, im new to spark, cassandra and scala and feel like im banging my head against a brick wall

谢谢 马特.

推荐答案

导入com.datastax.driver.core.utils.UUIDs并调用UUIDs.timeBased()生成一个timeuuid.

Import com.datastax.driver.core.utils.UUIDsand call UUIDs.timeBased()to generate a timeuuid.

在您的情况下:

rows.saveToCassandra("leech_seed_report", "daily_sessions", SomeColumns(UUIDS.timeBased(),
"date", "app", "vendor", "version", "offline", "count"))

这篇关于使用Apache Spark为Cassandra插件创建Timeuuid的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆