如何将csv添加到cassandra db？ [英] how can I add csv to cassandra db?

查看：51 发布时间：2020/9/29 20:49:38 cassandra

本文介绍了如何将csv添加到cassandra db？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道可以用传统方式完成，但是如果我要使用Cassandra DB，是否有一种简单/敏捷且灵活的方法将csv作为一组键值对添加到DB？

能够通过CSV文件添加时间序列数据是我的首要要求。我可以切换到任何其他数据库，例如mongodb，rike（如果可以方便地在其中使用）。

解决方案

编辑2 2017年12月2日

请使用端口9042。Cassandra访问已更改为CQL，默认端口为9042，9160是Thrift的默认端口。

编辑1

有一种更好的方法，无需任何编码。查看此答案 https://stackoverflow.com/a/18110080/298455

但是，如果您要进行预处理或自定义设置，则可能需要自己进行处理。这是一个冗长的方法：

创建列族。

  cqlsh>使用strategy_class = SimpleStrategy 
和strategy_options：replication_factor = 1创建键空间mykeyspace 
； 
 
 cqlsh>使用mykeyspace; 
 
 cqlsh：mykeyspace>创建表stackoverflow_question 
（id文本主键，名称文本，类文本）；

假设您的CSV是这样的：

  $ cat data.csv 
 id，name，class 
 1，hello，10 
 2，world，20

编写一个简单的Python代码以读取文件并转储到CF中。像这样：

 从pycassa.pool导入csv 
导入ConnectionPool 
从pycassa.columnfamily导入ColumnFamily 
 
池= ConnectionPool（'mykeyspace'，['localhost：9160']）
 cf = ColumnFamily（pool， stackoverflow_question）
 
具有open（'data.csv'，'rb'）作为csvfile：
 reader = csv.DictReader（csvfile）
用于阅读器中的行：
 print str（row）
键= row ['id'] 
 del row ['id'] 
 cf.insert（键，行）
 
 pool.dispose（）

执行此操作：

  $ python loadcsv.py 
 {'class'：'10'，'id'：'1'，'name'：'hello'} 
 {'class'：' 20'，'id'：'2'，'name'：'world'}

查看数据：

  cqlsh：mykeyspace>从stackoverflow_question中选择*； 
 id |类名称
 ---- + ------- + ------- 
 2 | 20 |世界
 1 | 10 |你好

另请参见：

一个。注意 DictReader

b。查看 Pycassa

c。 Google将现有的CSV加载程序添加到Cassandra。我想有。

d。不知道，使用CQL驱动程序可能有更简单的方法。

e。使用适当的数据类型。我只是将它们全部包装成文本。不好。

HTH

我没有看到时间序列要求。这是时间序列的处理方法。

这是您的数据

  $ cat data.csv 
 id，1383799600,1383799601,1383799605,1383799621,1383799714 
 1，传感器开启，传感器就绪，流出， flow-interrupt，sensor-killAll

创建传统的宽行。（CQL建议不要使用 COMPACT STORAGE ，但这只是为了让您快速入门。）

  cqlsh：mykeyspace>用紧凑的存储创建表时间序列
（id文本，时间戳文本，数据文本，主键（id，时间戳））
；

此更改后的代码：

<从$ pycassa.pool导入csv
从pycassa.column导入ConnectionPool
导入ColumnFamily

池= ConnectionPool（'mykeyspace'，['localhost：9160']）
cf = ColumnFamily（pool， timeseries）

，其中open（'data.csv'，'rb'）为csvfile：
读者= csv.DictReader（csvfile）
用于阅读器中的行：
print str（row）
键= row ['id']
del row ['id']
for row.iteritems（）中的（时间戳，数据）：
cf.insert（key，{timestamp：data}）

pool.dispose（）

这是您的时间序列

  cqlsh：mykeyspace>从时间序列中选择*； 
 id |时间戳|数据
 ---- + ------------ + ---------------- 
 1 | 1383799600 | sensor-on 
 1 | 1383799601 |传感器就绪
 1 | 1383799605 |流出
 1 | 1383799621 |流中断
 1 | 1383799714 | sensor-killAll

I know it can be done in traditional way, but if I were to use Cassandra DB, is there a easy/quick and agaile way to add csv to the DB as a set of key-value pairs ?

Ability to add a time-series data coming via CSV file is my prime requirement. I am ok to switch to any other database such as mongodb, rike, if it is conviniently doable there..

解决方案

Edit 2 Dec 02, 2017
Please use port 9042. Cassandra access has changed to CQL with default port as 9042, 9160 was default port for Thrift.

Edit 1
There is a better way to do this without any coding. Look at this answer https://stackoverflow.com/a/18110080/298455

However, if you want to pre-process or something custom you may want to so it yourself. here is a lengthy method:

Create a column family.

cqlsh> create keyspace mykeyspace 
with strategy_class = 'SimpleStrategy' 
and strategy_options:replication_factor = 1;

cqlsh> use mykeyspace;

cqlsh:mykeyspace> create table stackoverflow_question 
(id text primary key, name text, class text);

Assuming your CSV is like this:

$ cat data.csv 
id,name,class
1,hello,10
2,world,20

Write a simple Python code to read off of the file and dump into your CF. Something like this:

import csv 
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily

pool = ConnectionPool('mykeyspace', ['localhost:9160'])
cf = ColumnFamily(pool, "stackoverflow_question")

with open('data.csv', 'rb') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print str(row)
    key = row['id']
    del row['id']
    cf.insert(key, row)

pool.dispose()

Execute this:

$ python loadcsv.py 
{'class': '10', 'id': '1', 'name': 'hello'}
{'class': '20', 'id': '2', 'name': 'world'}

Look the data:

cqlsh:mykeyspace> select * from stackoverflow_question;
 id | class | name
----+-------+-------
  2 |    20 | world
  1 |    10 | hello

See also:

a. Beware of DictReader
b. Look at Pycassa
c. Google for existing CSV loader to Cassandra. I guess there are.
d. There may be a simpler way using CQL driver, I do not know.
e. Use appropriate data type. I just wrapped them all into text. Not good.

HTH

I did not see the time-series requirement. Here is how you do for time series.

This is your data

$ cat data.csv
id,1383799600,1383799601,1383799605,1383799621,1383799714
1,sensor-on,sensor-ready,flow-out,flow-interrupt,sensor-killAll

Create traditional wide row. (CQL suggests not to use COMPACT STORAGE, but this is just to get you going quickly.)

cqlsh:mykeyspace> create table timeseries 
(id text, timestamp text, data text, primary key (id, timestamp)) 
with compact storage;

This the altered code:

import csv
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily

pool = ConnectionPool('mykeyspace', ['localhost:9160'])
cf = ColumnFamily(pool, "timeseries")

with open('data.csv', 'rb') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print str(row)
    key = row['id']
    del row['id']
    for (timestamp, data) in row.iteritems():
      cf.insert(key, {timestamp: data})

pool.dispose()

This is your timeseries

cqlsh:mykeyspace> select * from timeseries;
 id | timestamp  | data
----+------------+----------------
  1 | 1383799600 |      sensor-on
  1 | 1383799601 |   sensor-ready
  1 | 1383799605 |       flow-out
  1 | 1383799621 | flow-interrupt
  1 | 1383799714 | sensor-killAll

这篇关于如何将csv添加到cassandra db？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将csv添加到cassandra db？ [英] how can I add csv to cassandra db?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将csv添加到cassandra db？ [英] how can I add csv to cassandra db?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭