从csv文件复制cassandra表 [英] COPY cassandra table from csv file

查看:85
本文介绍了从csv文件复制cassandra表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Mac(Mac OS X Yosemite和Oracle jdk1.7.0_55)上为Cassandra,Apache Spark和Flume设置演示环境.环境应作为新分析平台的概念证明,因此我需要在cassandra数据库中使用一些测试数据.我正在使用cassandra 2.0.8.

I'm setting up a demo landscape for Cassandra, Apache Spark and Flume on my Mac (Mac OS X Yosemite with Oracle jdk1.7.0_55). The landscape shall work as a proof of concept for a new analytics platform and therefore I need some test data in my cassandra db. I am using cassandra 2.0.8.

我在excel中创建了一些演示数据,并将其导出为CSV文件.结构是这样的:

I created some demo data in excel and exported that as a CSV file. The structure is like this:

ProcessUUID;ProcessID;ProcessNumber;ProcessName;ProcessStartTime;ProcessStartTimeUUID;ProcessEndTime;ProcessEndTimeUUID;ProcessStatus;Orderer;VorgangsNummer;VehicleID;FIN;Reference;ReferenceType
0F0D1498-D149-4FCC-87C9-F12783FDF769;AbmeldungKl‰rfall;1;Abmeldung Kl‰rfall;2011-02-03 04:05+0000;;2011-02-17 04:05+0000;;Finished;SIXT;4278;A-XA 1;WAU2345CX67890876;KLA-BR4278;internal

然后我使用以下命令在cqlsh中创建了键空间和列族:

I then created a keyspace and a column family in cqlsh using:

CREATE KEYSPACE dadcargate 
WITH REPLICATAION  = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };

use dadcargate;

CREATE COLUMNFAMILY Process (
  ProcessUUID uuid, ProcessID varchar, ProcessNumber bigint, ProcessName varchar, 
  ProcessStartTime timestamp, ProcessStartTimeUUID timeuuid, ProcessEndTime timestamp, 
  ProcessEndTimeUUID timeuuid, ProcessStatus varchar, Orderer varchar,
  VorgangsNummer varchar, VehicleID varchar, FIN varchar, Reference varchar,
  ReferenceType varchar, 
PRIMARY KEY (ProcessUUID))
WITH COMMENT='A process is like a bracket around multiple process steps';

列族名称及其中的所有列都使用小写字母创建-将来也必须对此进行调查,但目前还不那么重要.

The column family name and all columns in it are created with all lower case - will have to investigate into this as well some day, but that is not so relevant at the moment.

现在,我获取我的CSV文件,该文件包含大约1600个条目,并希望将其导入到名为process的表中,如下所示:

Now I take my CSV file, which has around 1600 entries and want to import that in my table named process like this:

cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, 
processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid,
fin, reference, referencetype) 
FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;

它给出了以下错误:

Record #0 (line 1) has the wrong number of fields (15 instead of 13).
0 rows imported in 0.050 seconds.

这基本上是对的,因为我的cvs-export中没有timeUUID字段.

Which is essentially true, As I do NOT have the timeUUID Fields in my cvs-export.

如果我尝试不使用像这样的显式列名的COPY命令(事实是,我实际上确实错过了两个字段):

If I try the COPY command without explicit column-names like this (given the fact, that I actually do miss two fields):

cqlsh:dadcargate> COPY process from 'Process_BulkData.csv' 
WITH DELIMITER = ';' AND HEADER = TRUE;

我最终遇到另一个错误:

I end up with another error:

Bad Request: Input length = 1
Aborting import at record #0 (line 1). Previously-inserted values still present.
0 rows imported in 0.009 seconds.

嗯.有点奇怪,但是还可以.也许COPY命令不喜欢缺少两个字段这一事实.我仍然认为这很奇怪,因为缺少的字段当然存在(从结构的角度来看),但是只有空白.

Hm. Kinda strange, but okay. Maybe the COPY command does not like the fact that there are two fields missing. I still think this to be strange, as the missing fields are of course there (from a structural point of view) but only empty.

我还有另外一枪:我删除了excel中缺少的列,再次将文件导出为cvs,并尝试在csv但显式列名称中导入WITHOUT标头行,如下所示:

I still have another shot: I deleted the missing columns in excel, exported the file again as cvs and try to import WITHOUT header line in my csv BUT explicit column names, like this:

cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname, 
processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid, 
fin, reference, referencetype) 
FROM 'Process_BulkData-2.csv' WITH DELIMITER = ';' AND HEADER = TRUE;

我收到此错误:

Bad Request: Input length = 1
Aborting import at record #0 (line 1). Previously-inserted values still present.
0 rows imported in 0.034 seconds.

任何人都可以告诉我我在做什么错吗?根据复制命令的文档,我设置命令的方式,应该至少对其中两个起作用.还是我想.

Can ANYONE tell me what I'm doing wrong here? According to the documentation of copy-command, the way I setup my commands, should work for at least two of them. Or so I would think.

但是不,我显然在这里缺少了一些重要的东西.

But nah, I'm obviously missing something important here.

推荐答案

cqlsh的COPY命令可能很敏感.但是,在 COPY文档中是以下行:

cqlsh's COPY command can be touchy. However, in the COPY documentation is this line:

CSV输入中的列数与Cassandra表元数据中的列数相同.

The number of columns in the CSV input is the same as the number of columns in the Cassandra table metadata.

请记住,我确实通过命名空白字段(分别为processstarttimeuuidprocessendtimeuuid)来使数据导入COPY FROM:

Keeping that in-mind, I did manage to get your data to import with a COPY FROM, by naming the empty fields (processstarttimeuuid and processendtimeuuid, respectively):

aploetz@cqlsh:stackoverflow> COPY process (processuuid, processid, processnumber, 
processname, processstarttime, processstarttimeuuid, processendtime, 
processendtimeuuid, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference, 
referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;

1 rows imported in 0.018 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM process ;

 processuuid                          | fin               | orderer | processendtime            | processendtimeuuid | processid         | processname        | processnumber | processstarttime          | processstarttimeuuid | processstatus | reference  | referencetype | vehicleid | vorgangsnummer
--------------------------------------+-------------------+---------+---------------------------+--------------------+-------------------+--------------------+---------------+---------------------------+----------------------+---------------+------------+---------------+-----------+----------------
 0f0d1498-d149-4fcc-87c9-f12783fdf769 | WAU2345CX67890876 |    SIXT | 2011-02-16 22:05:00+-0600 |               null | AbmeldungKl‰rfall | Abmeldung Kl‰rfall |             1 | 2011-02-02 22:05:00+-0600 |                 null |      Finished | KLA-BR4278 |      internal |    A-XA 1 |           4278

(1 rows)

这篇关于从csv文件复制cassandra表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆