为从CSV文件导入的每一行添加UUID [英] Adding UUID for each row being imported from a CSV file
问题描述
我们希望将.csv文件中的10万行导入到Cassandra表中。
We want to import 100 thousand rows from a .csv file into a Cassandra table.
每行都没有唯一的值,因此我们希望将UUID添加到每个导入的行中,从CSV文件导入数据时如何自动执行此操作。
There is no unique value for each row, for this reason we want to add UUID to each imported row, how do we do this automatically while importing data from CSV file.
.CSV中的示例行(第一行是列名)文件
DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879
要向每行添加UUID,如下所示
UID, DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
c37d661d-7e61-49ea-96a5-68c34e83db3a,2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879
推荐答案
无法直接从CQL的 COPY命令,但是您可以先在Cassandra之外处理CSV文件。
There's no way to do that directly from CQL's COPY command, but instead you could process the CSV file outside of Cassandra first.
例如,这是一个Python脚本,将从文件in.csv中读取,附加每行一个UUID列,并写出到out.csv:
For example, here's a Python script that will read in from file in.csv, append a UUID column to each row, and write out to out.csv:
#!/usr/bin/python
# read in.csv adding one column for UUID
import csv
import uuid
fin = open('in.csv', 'rb')
fout = open('out.csv', 'w')
reader = csv.reader(fin, delimiter=',', quotechar='"')
writer = csv.writer(fout, delimiter=',', quotechar='"')
firstrow = True
for row in reader:
if firstrow:
row.append('UUID')
firstrow = False
else:
row.append(uuid.uuid4())
writer.writerow(row)
生成的文件可能是imp使用CQL COPY排序(相应地创建架构后)。如果使用此示例,请确保阅读 Python的uuid函数以进行选择您需要的那个(可能是 uuid1
或 uuid4
)。
The resulting file could be imported using CQL COPY (after you've created your schema accordingly). If you use this example, make sure to read up on Python's uuid functions to choose the one you need (probably uuid1
or uuid4
).
这篇关于为从CSV文件导入的每一行添加UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!