如何通过 pandas 从Google Cloud Function中的Google Cloud Storage访问csv文件? [英] How to access csv file from Google Cloud Storage in a Google Cloud Function via Pandas?
问题描述
我是云功能的新手,所以我遵循了默认的GCP云功能.它运行良好,并打印了"hello world".如预期的那样.我只更改了requirements.txt文件,使其包含熊猫和google-cloud-storage.同样,我对main.py脚本的所有编辑都在函数定义之前的imports部分和函数else部分中.
I'm new to cloud functions, so I followed the default GCP cloud function "hello world" tutorial. It worked fine and printed "hello world" as expected. I only changed the requirements.txt file to include pandas and google-cloud-storage. Likewise, all my edits to the main.py script were in the imports section before the function definition AND in the else section of the function.
requirements.txt
requirements.txt
pandas
google-cloud-storage
main.py:
import pandas as pd
from google.cloud import storage
def hello_world(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
if request.args and 'message' in request.args:
return request.args.get('message')
elif request_json and 'message' in request_json:
return request_json['message']
else:
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
model_filename = "my_file.csv"
blob = bucket.blob(model_filename)
blob.download_to_filename('temp.csv')
with open('temp.csv','rb') as f:
df = pd.read_csv(f)
return str(df.columns)
当我在GCP的测试云功能"中测试功能时,区域中,在日志中捕获了以下错误.前7行似乎样板错误,而后两行则是我的实际程序所特有的. file_to_filename中的文件"/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py"行1183,打开(文件名,"wb")作为file_obj:OSError:[Errno 30]只读文件系统:"temp.csv"
.我不知道为什么会触发此错误.
When I test the function in GCP's "test cloud function" area, the following errors are captured in the logs. The first 7 lines seem to boilerplate errors while the last two are specific to my actual program. File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'
. I have no idea why this error is triggering.
错误:
Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request()
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e)
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb)
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request()
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args)
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 87, in view_func return function(request._get_current_object())
File "/workspace/main.py", line 25, in hello_world blob.download_to_filename('temp.csv')
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'
对于上下文,我已经将凭据添加到了适当的服务帐户,该云功能将根据我设置的配置使用该凭据.因此,除了授权,我不知道为什么该功能会失败.我应该更改什么?
For context, I've already added credentials to the appropriate service account, which this cloud function uses as per the configurations I set up. So, authorization aside, I have no idea why the function failing. What should I change?
对于上下文,我只是试图从熊猫的云存储中打开一个任意的csv文件,并将列的名称作为字符串返回.这没有实际价值,只是在建立有价值的东西之前进行功能测试.
For context, I'm simply trying to open an arbitrary csv file from cloud storage in pandas and return the names of the columns as a string. This has no practical value, just a functional test before building something of value.
据我所知,赋予与该云功能相对应的服务帐户的特定IAM角色是角色/编辑器",应该足够.
The specific IAM role given to the service account corresponding to the cloud function in question is 'roles/editor' which should be sufficient, as far as I can know.
Edit2:GCP云功能似乎在中运行只读环境.因此,必须使用其他方法来打开文件,而无需使用 blob.download_to_filename
命令.
It appears that GCP cloud functions operate in a read only environment. So there must be some other way to open the file, without using the blob.download_to_filename
command.
推荐答案
您是Cloud Functions的新手,需要了解一些知识并避免一些陷阱.其中之一:Cloud Functions是无状态的,您不能在文件系统上编写.
You are new on Cloud Functions and there are some stuff to know and some trap to avoid. One of them: Cloud Functions is stateless, you can't write on the file system.
除.这是一个内存中的文件系统(正确设置您的Cloud Functions内存大小,以考虑您的应用程序内存占用量+/tmp目录中存储的文件大小)
Except on the /tmp
directory. It's a in memory file system (size correctly your Cloud Functions memory size to take into account your app memory footprint + the file size stored in the /tmp dir)
像这样更新您的云功能
....
else:
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
model_filename = "my_file.csv"
blob = bucket.blob(model_filename)
blob.download_to_filename('/tmp/temp.csv')
with open('/tmp/temp.csv','rb') as f:
df = pd.read_csv(f)
return str(df.columns)
这篇关于如何通过 pandas 从Google Cloud Function中的Google Cloud Storage访问csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!