上载到Google云端存储时,输出数据会以随机顺序显示 [英] Output data appears in a random order when uploaded to google cloud storage

查看:56
本文介绍了上载到Google云端存储时,输出数据会以随机顺序显示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用google-dataflow-sdk将CSV文件上传到Google云端存储.当我将文件上传到Google云项目时,我的数据在云中以随机顺序显示在文件中.csv上的每一行都是正确的,但是行到处都是.

I've been using the google-dataflow-sdk to upload CSV files to google cloud storage. When I upload the file to a google cloud project, my data appears in a file in a random order on the cloud. Each line on the csv is correct, but the rows are all over the place.

csv的标头)属性,属性,属性)始终在另一行,而永远不在顶部.我再次强调,每一列中的数据都很好,只是行的位置是随机的.

The header of the csv )i.e. attribute, attribute, attribute) are on another line all the time and never at the top where is should be. I stress again, the data in each column is fine, it is just the rows that are randomly positioned.

下面是最初读取数据的代码:

here is the code which reads the data initially:

PCollection<String> csvData = pipeline.apply(TextIO.Read.named("ReadItems")
                                             .from(filename));

这是写入Google云端项目的代码:

and this is the code that writes to the google cloud project:

csvData.apply(TextIO.Write.named("WriteToCloud")
                          .to("gs://dbm-poc/"+partnerId+"/"+dateOfReport+modifiedFileName)
                          .withSuffix(".csv"));

感谢您的帮助.

推荐答案

尽管我同意Graham Polley提供的答案是正确的,但我设法找到了一种更简单的方法来使数据按有序方式写入.

Whilst i agree the answer provided by Graham Polley is correct, I managed to find a much simpler way to get the data to write in an ordered way.

我改为使用Google云存储库将需要的文件存储到云中,就像这样:

I instead used the google cloud storage library to store the files I would need onto the cloud, like so:

public static String writeFile(byte[] content, String filename, String partnerId, String dateOfReport) {
    Storage storage = StorageOptions.defaultInstance().service();
    BlobId blobId = BlobId.of("dbm-poc", partnerId + "/" + dateOfReport + "-" + filename + ".csv");
    BlobInfo blobInfo = BlobInfo.builder(blobId).contentType("binary/octet-stream").build();
    storage.create(blobInfo, content);

    return filename;
}

public static byte[] readFile(String filename) throws IOException {
    return Files.readAllBytes(Paths.get(filename));
}

结合使用这两种方法,我不仅可以将文件上传到我想要的存储桶中,而且不会丢失任何内容排序,而且还可以从文本更改上传文件的格式到二进制/八位字节流文件,这意味着可以访问和下载该文件.

Using these two methods in conjunction with each other, I was not only able to upload the files to the bucket i wanted without losing any of the contents ordering, but i was also able to change the format of the uploaded files from text to a binary/octet-stream file which means it can be access and downloaded.

此方法似乎也消除了需要使用管道上传数据的情况.

This method also seems to remove the need to have a pipeline to upload data.

这篇关于上载到Google云端存储时,输出数据会以随机顺序显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆