如何使用 Java 逐行从 Google Cloud Storage 读取巨大的 CSV 文件? [英] How to read a huge CSV file from Google Cloud Storage line by line using Java?

查看:21
本文介绍了如何使用 Java 逐行从 Google Cloud Storage 读取巨大的 CSV 文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I'm new to Google Cloud Platform. I'm trying to read a CSV file present in Google Cloud Storage (non-public bucket accessed via Service Account key) line by line which is around 1GB.

I couldn't find any option to read the file present in the Google Cloud Storage (GCS) line by line. I only see the read by chunksize/byte size options. Since I'm trying to read a CSV, I don't want to use read by chunksize since it may split a record while reading.

Solutions tried so far: Tried copying the contents from CSV file present in GCS to temporary local file and read the temp file by using the below code. The below code is working as expected but I don't want to copy huge file to my local instance. Instead, I want to read line by line from GCS.

    StorageOptions options = 
    StorageOptions.newBuilder().setProjectId(GCP_PROJECT_ID)
            .setCredentials(gcsConfig.getCredentials()).build();
    Storage storage = options.getService();
    Blob blob = storage.get(BUCKET_NAME, FILE_NAME);
    ReadChannel readChannel = blob.reader();
    FileOutputStream fileOuputStream = new FileOutputStream(TEMP_FILE_NAME);
    fileOuputStream.getChannel().transferFrom(readChannel, 0, Long.MAX_VALUE);
    fileOuputStream.close();

Please suggest the approach.

解决方案

Since, I'm doing batch processing, I'm using the below code in my ItemReader's init() method which is annotated with @PostConstruct. And In my ItemReader's read(), I'm building a List. Size of list is same as chunk size. In this way I can read lines based on my chunkSize instead of reading all the lines at once.

StorageOptions options = 
StorageOptions.newBuilder().setProjectId(GCP_PROJECT_ID)
        .setCredentials(gcsConfig.getCredentials()).build();
Storage storage = options.getService();
Blob blob = storage.get(BUCKET_NAME, FILE_NAME);
ReadChannel readChannel = blob.reader();
BufferedReader br = new BufferedReader(Channels.newReader(readChannel, "UTF-8"));

这篇关于如何使用 Java 逐行从 Google Cloud Storage 读取巨大的 CSV 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆