如何将文件从Cloud Storage加载到内存中 [英] How do I load a file from Cloud Storage into memory

查看:91
本文介绍了如何将文件从Cloud Storage加载到内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些最终用户要将csv文件上传到存储桶中,然后将其加载到BigQuery中. 问题是数据的内容不可靠. 即,它包含带有自由文本的字段,其中可能包含换行符,多余的逗号,无效的日期格式,例如e.t.c.

I have end users that are going to be uploading a csv file into a bucket which will then be loaded to BigQuery. The issue is the content of the data is unreliable. i.e. it contains fields with free text that may contain linefeeds,extra commas, invalid date formats e.t.c. e.t.c.

我有一个python脚本,可以对该文件进行预处理并写出一个新的错误已纠正的新文件.

I have a python script that will pre-process the file and write out a new one with all errors corrected.

我需要能够将其自动化到云中. 我当时想可以将文件的内容(很小)加载到内存中并处理记录,然后将其写回到存储桶中. 我不想在本地处理文件.

I need to be able to automate this into the cloud. I was thinking I could load the contents of the file (it's only small) into memory and process the records then write it back out to the Bucket. I do not want to process the file locally.

尽管进行了广泛的搜索,但是我找不到如何将存储桶中的文件加载到内存中,然后再次将其写回.

Despite extensive searching I can't find how to load a file in a bucket into memory and then write it back out again.

任何人都可以帮忙吗?

推荐答案

我相信您正在寻找的是 Google云功能.您可以将Cloud Function设置为通过上传到GCS存储桶触发 ,并使用您的同一Cloud Function中的Python代码来处理.csv并将其上传到BigQuery,但是,请记住,Python 3.7.1对Cloud Functions的支持目前处于Beta开发状态.

I believe what you’re looking for is Google Cloud Functions. You can set a Cloud Function to be triggered by an upload to the GCS bucket, and use your Python code in the same Cloud Function to process the .csv and upload it to BigQuery, however, please bear in mind that Python 3.7.1 support for Cloud Functions is currently in a Beta state of development.

这篇关于如何将文件从Cloud Storage加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆