如何使用Java逐行从Google云端存储中读取巨大的CSV文件? [英] How to read a huge CSV file from Google Cloud Storage line by line using Java?

查看:134
本文介绍了如何使用Java逐行从Google云端存储中读取巨大的CSV文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Google Cloud Platform的新手.我正在尝试逐行读取Google Cloud Storage(通过服务帐户密钥访问的非公共存储桶)中存在的CSV文件,其大小约为1GB.

I'm new to Google Cloud Platform. I'm trying to read a CSV file present in Google Cloud Storage (non-public bucket accessed via Service Account key) line by line which is around 1GB.

我找不到任何选项来逐行读取Google Cloud Storage(GCS)中存在的文件.我只看到按块大小/字节大小读取选项.由于我正在尝试读取CSV,因此我不希望使用按块大小读取,因为它可能会在读取时拆分记录.

I couldn't find any option to read the file present in the Google Cloud Storage (GCS) line by line. I only see the read by chunksize/byte size options. Since I'm trying to read a CSV, I don't want to use read by chunksize since it may split a record while reading.

到目前为止已尝试的解决方案: 尝试将GCS中存在的CSV文件中的内容复制到临时本地文件中,并使用以下代码读取临时文件.下面的代码按预期工作,但我不想将大文件复制到本地实例.相反,我想从GCS逐行读取.

Solutions tried so far: Tried copying the contents from CSV file present in GCS to temporary local file and read the temp file by using the below code. The below code is working as expected but I don't want to copy huge file to my local instance. Instead, I want to read line by line from GCS.

    StorageOptions options = 
    StorageOptions.newBuilder().setProjectId(GCP_PROJECT_ID)
            .setCredentials(gcsConfig.getCredentials()).build();
    Storage storage = options.getService();
    Blob blob = storage.get(BUCKET_NAME, FILE_NAME);
    ReadChannel readChannel = blob.reader();
    FileOutputStream fileOuputStream = new FileOutputStream(TEMP_FILE_NAME);
    fileOuputStream.getChannel().transferFrom(readChannel, 0, Long.MAX_VALUE);
    fileOuputStream.close();

请提出方法.

推荐答案

由于我正在进行批处理,因此我在ItemReader的init()方法中使用了以下代码,并在其中添加了@PostConstruct注释.在我的ItemReader的read()中,我正在构建一个列表.列表的大小与块大小相同.这样,我可以基于我的chunkSize读取行,而不是一次读取所有行.

Since, I'm doing batch processing, I'm using the below code in my ItemReader's init() method which is annotated with @PostConstruct. And In my ItemReader's read(), I'm building a List. Size of list is same as chunk size. In this way I can read lines based on my chunkSize instead of reading all the lines at once.

StorageOptions options = 
StorageOptions.newBuilder().setProjectId(GCP_PROJECT_ID)
        .setCredentials(gcsConfig.getCredentials()).build();
Storage storage = options.getService();
Blob blob = storage.get(BUCKET_NAME, FILE_NAME);
ReadChannel readChannel = blob.reader();
BufferedReader br = new BufferedReader(Channels.newReader(readChannel, "UTF-8"));

这篇关于如何使用Java逐行从Google云端存储中读取巨大的CSV文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆