从C 表获取不同的分区键 [英] Get distinct partition keys from C* table*

查看：66 发布时间：2020/9/29 19:57:17 cassandra datastax-enterprise cqlsh

本文介绍了从C *表获取不同的分区键的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

cqlsh不允许嵌套查询，因此我无法将所选数据导出到csv。
我正在尝试使用以下方式从cassandra导出所选数据（单列约200,000行）：

cqlsh doesn't allow nested queries so I cant export selected data to csv.. I'm trying to export the selected data (about 200,000 rows with a single column) from cassandra using:

echo从listener.snapshots选择不同的imei； > select.cql bin / cqlsh -f select.cql> output.txt

它永久地卡住了，没有任何错误，并且文件没有增长。

and it just stuck forever without any error, and the file isn't growing.

如果我在最后一行使用strace，则会得到很多行，例如：

if I use strace on the last line I got many rows like:

select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 8000})  = 0 (Timeout)

和--debug

cqlsh --debug -f select.cql> output.txt

Using CQL driver: <module 'cassandra' from '/usr/share/dse/resources/cassandra/bin/../lib/cassandra-driver-internal-only-2.5.1.zip/cassandra-driver-2.5.1/cassandra/__init__.py'>

怎么了？
是否有更好的方法从大型C *表中获取不同的分区键？

推荐答案

我使用了捕获：

cqlsh> CAPTURE 'temp.csv'                                              
Now capturing query output to 'temp.csv'.
cqlsh> SELECT distinct imei FROM listener.snapshots;
---MORE---
---MORE---
---MORE---
---MORE---
.
.
.
cqlsh> 
cqlsh>

然后按Enter直到完成。

And press enter until it finished.

更快的选择是使用分页：

cqlsh> PAGING off
Disabled Query paging.
cqlsh> CAPTURE 'temp.csv'                                              
Now capturing query output to 'temp.csv'.
cqlsh> SELECT distinct imei FROM listener.snapshots;

它将立即将数据提取到文件中（如果您获得OperationTimedOut，则应编辑超时设置

It would immediately extract the data to the file (if you get a OperationTimedOut you should edit the timeout settings in cassandra.yaml).

我不敢相信这是快速的方式...我知道我可以通过使用 CassandraSQLContext ，但是当我需要创建rdd查询时，它并没有那么快C *表示非常大的表（2B行〜）中的不同列，并将它们打印到文件中：

I cant believe that it is the fasts way there is... I know I can export data using spark by using CassandraSQLContext but its not so fast when I need to create the rdd querying C* for distinct column out of very large table(2B rows~), and print them to file:

    val conf = new SparkConf().setAppName("ExtractDistinctImeis")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val connector = CassandraConnector(conf)
    val cc = new CassandraSQLContext(sc)

    val snapshots_imeis = cc.sql("select distinct imei from listener.snapshots").map(row => row(0).toString)

    val imeis = snapshots_imeis.collect 

    def printToFile(f: java.io.File)(op: java.io.PrintWriter => Unit) {
        val p = new java.io.PrintWriter(f)
        try { op(p) } finally { p.close() }
    }

    printToFile(new File("/path/to/file.txt")) { p => imeis.foreach(p.println) }

花了3.5个小时才火花！通过捕获，我设法在3分钟/ 3秒后获取了文件。

It took 3.5 hours with spark! With capture I manage to get my file after 3 min/3 sec.

这篇关于从C *表获取不同的分区键的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从C 表获取不同的分区键 [英] Get distinct partition keys from C* table*

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从C *表获取不同的分区键 [英] Get distinct partition keys from C* table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

从C 表获取不同的分区键 [英] Get distinct partition keys from C* table*

登录关闭