如何使用 R 从 Cassandra 读取数据? [英] How to read data from Cassandra with R?
问题描述
我正在使用 R 2.14.1 和 Cassandra 1.2.11,我有一个单独的程序,它已将数据写入单个 Cassandra 表.我没有从 R 中读取它们.
I am using R 2.14.1 and Cassandra 1.2.11, I have a separate program which has written data to a single Cassandra table. I am failing to read them from R.
Cassandra 模式定义如下:
The Cassandra schema is defined like this:
create table chosen_samples (id bigint , temperature double, primary key(id))
我首先尝试了 RCassandra 包 (http://www.rforge.net/RCassandra/)
I have first tried the RCassandra package (http://www.rforge.net/RCassandra/)
> # install.packages("RCassandra")
> library(RCassandra)
> rc <- RC.connect(host ="192.168.33.10", port = 9160L)
> RC.use(rc, "poc1_samples")
> cs <- RC.read.table(rc, c.family="chosen_samples")
连接似乎成功,但将表解析为数据框失败:
The connection seems to succeed but the parsing of the table into data frame fails:
> cs
Error in data.frame(..dfd. = c("@"ffffff", "@(<cc><cc><cc><cc><cc><cd>", :
duplicate row.names:
我也尝试过使用 JDBC 连接器,如下所述:http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
I have also tried using JDBC connector, as described here: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
> # install.packages("RJDBC")
> library(RJDBC)
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", "/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar", "`")
但是这个失败是这样的:
But this one fails like this:
Error in .jfindClass(as.character(driverClass)[1]) : class not found
即使java驱动的位置是正确的
Even though the location to the java driver is correct
$ ls /Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
推荐答案
这个问题现在已经过时了,但由于它是 R 和 Cassandra 的热门话题之一,我想我会在这里留下一个简单的解决方案,因为我发现令人沮丧的是,对于我认为相当常见的任务的最新支持很少.
This question is old now, but since it's the one of the top hits for R and Cassandra I thought I'd leave a simple solution here, as I found frustratingly little up-to-date support for what I thought would be a fairly common task.
Sparklyr 现在可以很容易地从头开始,因为它公开了一个 Java 上下文,因此可以直接使用 Spark-Cassandra-Connector
.我已经将绑定打包在这个简单的包中,crassy,但没有必要使用.
Sparklyr makes this pretty easy to do from scratch now, as it exposes a java context so the Spark-Cassandra-Connector
can be used directly. I've wrapped up the bindings in this simple package, crassy, but it's not necessary to use.
我主要是为了揭开如何让 sparklyr
加载连接器的配置的神秘面纱,因为选择列子集的语法有点笨拙(假设没有 Scala 知识).
I mostly made it to demystify the config around how to make sparklyr
load the connector, and as the syntax for selecting a subset of columns is a little unwieldy (assuming no Scala knowledge).
支持列选择和分区过滤.鉴于 CQL 无法直接提交到集群,我认为这些是一般 Cassandra 用例所必需的唯一功能.
Column selection and partition filtering are supported. These were the only features I thought were necessary for general Cassandra use cases, given CQL can't be submitted directly to the cluster.
我还没有找到提交不涉及编写自定义 Scala 的更通用 CQL 查询的解决方案,但是有一个示例说明了它是如何工作的 这里.
I've not found a solution to submitting more general CQL queries which doesn't involve writing custom scala, however there's an example of how this can work here.
这篇关于如何使用 R 从 Cassandra 读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!