为从CSV文件导入的每一行添加UUID [英] Adding UUID for each row being imported from a CSV file

查看:153
本文介绍了为从CSV文件导入的每一行添加UUID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们希望将.csv文件中的10万行导入到Cassandra表中。

We want to import 100 thousand rows from a .csv file into a Cassandra table.

每行都没有唯一的值,因此我们希望将UUID添加到每个导入的行中,从CSV文件导入数据时如何自动执行此操作。

There is no unique value for each row, for this reason we want to add UUID to each imported row, how do we do this automatically while importing data from CSV file.

.CSV中的示例行(第一行是列名)文件

DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879

要向每行添加UUID,如下所示

UID, DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
c37d661d-7e61-49ea-96a5-68c34e83db3a,2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879


推荐答案

无法直接从CQL的 COPY命令,但是您可以先在Cassandra之外处理CSV文件。

There's no way to do that directly from CQL's COPY command, but instead you could process the CSV file outside of Cassandra first.

例如,这是一个Python脚本,将从文件in.csv中读取,附加每行一个UUID列,并写出到out.csv:

For example, here's a Python script that will read in from file in.csv, append a UUID column to each row, and write out to out.csv:

#!/usr/bin/python
# read in.csv adding one column for UUID

import csv
import uuid

fin = open('in.csv', 'rb')
fout = open('out.csv', 'w')

reader = csv.reader(fin, delimiter=',', quotechar='"')
writer = csv.writer(fout, delimiter=',', quotechar='"')

firstrow = True
for row in reader:
    if firstrow:
        row.append('UUID')
        firstrow = False
    else:
        row.append(uuid.uuid4())
    writer.writerow(row)

生成的文件可能是imp使用CQL COPY排序(相应地创建架构后)。如果使用此示例,请确保阅读 Python的uuid函数以进行选择您需要的那个(可能是 uuid1 uuid4 )。

The resulting file could be imported using CQL COPY (after you've created your schema accordingly). If you use this example, make sure to read up on Python's uuid functions to choose the one you need (probably uuid1 or uuid4).

这篇关于为从CSV文件导入的每一行添加UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆