如何使用 R 从 Cassandra 读取数据? [英] How to read data from Cassandra with R?

查看:16
本文介绍了如何使用 R 从 Cassandra 读取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 R 2.14.1 和 Cassandra 1.2.11,我有一个单独的程序,它已将数据写入单个 Cassandra 表.我没有从 R 中读取它们.

I am using R 2.14.1 and Cassandra 1.2.11, I have a separate program which has written data to a single Cassandra table. I am failing to read them from R.

Cassandra 模式定义如下:

The Cassandra schema is defined like this:

create table chosen_samples (id bigint , temperature double, primary key(id))

我首先尝试了 RCassandra 包 (http://www.rforge.net/RCassandra/)

I have first tried the RCassandra package (http://www.rforge.net/RCassandra/)

> # install.packages("RCassandra")
> library(RCassandra)
> rc <- RC.connect(host ="192.168.33.10", port = 9160L)
> RC.use(rc, "poc1_samples")
> cs <- RC.read.table(rc, c.family="chosen_samples")

连接似乎成功,但将表解析为数据框失败:

The connection seems to succeed but the parsing of the table into data frame fails:

> cs
Error in data.frame(..dfd. = c("@"ffffff", "@(<cc><cc><cc><cc><cc><cd>",  : 
  duplicate row.names: 

我也尝试过使用 JDBC 连接器,如下所述:http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive

I have also tried using JDBC connector, as described here: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive

> # install.packages("RJDBC")
> library(RJDBC)
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", "/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar", "`")

但是这个失败是这样的:

But this one fails like this:

Error in .jfindClass(as.character(driverClass)[1]) : class not found

即使java驱动的位置是正确的

Even though the location to the java driver is correct

$ ls /Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar

推荐答案

这个问题现在已经过时了,但由于它是 R 和 Cassandra 的热门话题之一,我想我会在这里留下一个简单的解决方案,因为我发现令人沮丧的是,对于我认为相当常见的任务的最新支持很少.

This question is old now, but since it's the one of the top hits for R and Cassandra I thought I'd leave a simple solution here, as I found frustratingly little up-to-date support for what I thought would be a fairly common task.

Sparklyr 现在可以很容易地从头开始,因为它公开了一个 Java 上下文,因此可以直接使用 Spark-Cassandra-Connector.我已经将绑定打包在这个简单的包中,crassy,但没有必要使用.

Sparklyr makes this pretty easy to do from scratch now, as it exposes a java context so the Spark-Cassandra-Connector can be used directly. I've wrapped up the bindings in this simple package, crassy, but it's not necessary to use.

我主要是为了揭开如何让 sparklyr 加载连接器的配置的神秘面纱,因为选择列子集的语法有点笨拙(假设没有 Scala 知识).

I mostly made it to demystify the config around how to make sparklyr load the connector, and as the syntax for selecting a subset of columns is a little unwieldy (assuming no Scala knowledge).

支持列选择和分区过滤.鉴于 CQL 无法直接提交到集群,我认为这些是一般 Cassandra 用例所必需的唯一功能.

Column selection and partition filtering are supported. These were the only features I thought were necessary for general Cassandra use cases, given CQL can't be submitted directly to the cluster.

我还没有找到提交不涉及编写自定义 Scala 的更通用 CQL 查询的解决方案,但是有一个示例说明了它是如何工作的 这里.

I've not found a solution to submitting more general CQL queries which doesn't involve writing custom scala, however there's an example of how this can work here.

这篇关于如何使用 R 从 Cassandra 读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆