为从 CSV 文件导入的每一行添加 UUID [英] Adding UUID for each row being imported from a CSV file

查看:23
本文介绍了为从 CSV 文件导入的每一行添加 UUID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们想将 .csv 文件中的 10 万行数据导入 Cassandra 表.

We want to import 100 thousand rows from a .csv file into a Cassandra table.

每一行都没有唯一值,因此我们希望将 UUID 添加到每个导入的行,我们如何在从 CSV 文件导入数据时自动执行此操作.

There is no unique value for each row, for this reason we want to add UUID to each imported row, how do we do this automatically while importing data from CSV file.

.CSV 文件中的示例行(第一行是列名)

DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879

想为每一行添加 UUID,如下所示

UID, DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
c37d661d-7e61-49ea-96a5-68c34e83db3a,2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879

推荐答案

没有办法直接从 CQL 的 COPY 命令,但您可以先在 Cassandra 之外处理 CSV 文件.

There's no way to do that directly from CQL's COPY command, but instead you could process the CSV file outside of Cassandra first.

例如,这是一个 Python 脚本,它将从 in.csv 文件中读取,将 UUID 列附加到每一行,然后写入 out.csv:

For example, here's a Python script that will read in from file in.csv, append a UUID column to each row, and write out to out.csv:

#!/usr/bin/python
# read in.csv adding one column for UUID

import csv
import uuid

fin = open('in.csv', 'rb')
fout = open('out.csv', 'w')

reader = csv.reader(fin, delimiter=',', quotechar='"')
writer = csv.writer(fout, delimiter=',', quotechar='"')

firstrow = True
for row in reader:
    if firstrow:
        row.append('UUID')
        firstrow = False
    else:
        row.append(uuid.uuid4())
    writer.writerow(row)

可以使用 CQL COPY 导入生成的文件(在您相应地创建架构之后).如果您使用此示例,请务必阅读 Python 的 uuid 函数 以选择您需要的那个(可能是 uuid1uuid4).

The resulting file could be imported using CQL COPY (after you've created your schema accordingly). If you use this example, make sure to read up on Python's uuid functions to choose the one you need (probably uuid1 or uuid4).

这篇关于为从 CSV 文件导入的每一行添加 UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆