cassandra.cluster.NoHostAvailable:查询大量数据时,“无法完成对任何主机的操作” [英] cassandra.cluster.NoHostAvailable: 'Unable to complete the operation against any hosts' when querying a lot of data

查看:246
本文介绍了cassandra.cluster.NoHostAvailable:查询大量数据时,“无法完成对任何主机的操作”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码从cassandra查询数据:

I use this code to query data from cassandra:

from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import SimpleStatement
import pandas as pd

cluster = Cluster(contact_points=['192.168.2.4'],port=9042)
session = cluster.connect()

def testContectRemoteDatabase():
    contact_points = ['192.168.2.4']
    auth_provider = PlainTextAuthProvider(username='XXX', password='XX')
    cluster = Cluster(contact_points=contact_points, auth_provider=auth_provider)
    session = cluster.connect()
    cql_str = 'select * from DB1.mytable ;'
    simple_statement = SimpleStatement(cql_str, consistency_level=ConsistencyLevel.ONE,fetch_size=2000000)
    execute_result = session.execute(simple_statement, timeout=None)
    result = execute_result._current_rows
    cluster.shutdown()
    df = pd.DataFrame(result)
    df.to_csv('./my_test.csv', index=False, mode='w', header=True)

if __name__ == '__main__':
    testContectRemoteDatabase()

当我设置 fetch_size = 1000000 ,没有错误,但是当我设置 fetch_size = 2000000 时,此错误消息是:

When I set fetch_size=1000000, there is no error, but when I set fetch_size=2000000, this error message is:

Traceback (most recent call last):
  File "test.py", line 24, in <module>
    testContectRemoteDatabase()
  File "test.py", line 17, in testContectRemoteDatabase
    execute_result = session.execute(simple_statement, timeout=None)
  File "cassandra\cluster.py", line 2618, in cassandra.cluster.Session.execute
  File "cassandra\cluster.py", line 4877, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.2.4:9042 datacenter1>: ConnectionShutdown('errors=Connection heartbeat timeout after 30 seconds, last_host=192.168.2.4:9042')})

如何解决?

推荐答案

从Erick的描述来看,从Cassandra的角度来看,您的代码不是很理想,而且当您的数据多于可用内存时,它也不起作用。

Your code isn't very optimal from Cassandra point of view as described by Erick, plus it won't work when you have more data than you have available memory.

如果您只需要将数据从数据库导出为CSV或其他格式-无需重新发明轮子,而是使用 DSBulk 。它将非常简单:

If you just need to export data from DB to CSV or other formats - don't reinvent the wheel, but use DSBulk. it's will be as simple as:

dsbulk unload -k keyspace -t table -u user -p password -url filename

请参阅以下博客示例:

  • https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
  • https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
  • https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
  • https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
  • https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
  • https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations

这篇关于cassandra.cluster.NoHostAvailable:查询大量数据时,“无法完成对任何主机的操作”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆