将文件从/tmp 文件夹移动到 Google Cloud Storage 存储分区 [英] Move file from /tmp folder to Google Cloud Storage bucket

查看:23
本文介绍了将文件从/tmp 文件夹移动到 Google Cloud Storage 存储分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最初发布 这个问题 当我无法让我的 python 云函数创建和写入新文件时.从那以后,我设法在 /tmp 目录中创建了一个 csv,但我一直在努力寻找一种方法将该文件移动到我上传原始 csv 的存储桶文件夹中.

I originally posted this question when I was having trouble getting my python cloud function to create and write to a new file. Since then I've managed to create a csv in the /tmp directory but am struggling to find a way to move that file into my bucket's folder where the original csv was uploaded.

可以这样做吗?我查看了 Google Cloud Storage 文档并尝试使用 blob.download_to_filename()bucket.copy_blob() 方法,但目前出现错误:FileNotFoundError:[Errno 2] 没有这样的文件或目录:'my-project.appspot.com/my-folder/my-converted-file.csv'

Is it possible to do this? I've looked through the Google Cloud Storage docs and tried using the blob.download_to_filename() and bucket.copy_blob() methods but am currently getting the error: FileNotFoundError: [Errno 2] No such file or directory: 'my-project.appspot.com/my-folder/my-converted-file.csv'

感谢任何帮助或建议!

推荐答案

将该文件移动到我的存储桶中

to move that file into my bucket

这是一个例子.请记住:

Here is an example. Bear in mind:

  1. 不要不假思索地复制粘贴.
  2. 代码片段只是为了展示这个想法 - 它不会按原样工作.需要进行修改以适应您的环境和要求.
  3. _crc32sum函数不是我开发的.
  4. 我没有测试代码.我只是从不同的公共资源中复制了一些元素.

代码如下:


import base64
import crc32c
import os

from google.cloud import exceptions
from google.cloud import storage

# =====> ==============================
# a function to calculate crc32c hash
def _crc32sum(filename: str, blocksize: int = 65536) -> int:
    """Calculate the crc32c hash for a file with the provided name

    :param filename: the name of the file
    :param blocksize: the size of the block for the file reading
    :return: the calculated crc32c hash for the given file
    """
    checksum = 0
    with open(filename, "rb") as f_ref:
        for block in iter(lambda: f_ref.read(blocksize), b""):
            checksum = crc32c.crc32(block, checksum)
    return checksum & 0xffffffff
# =====> ==============================

# use the default project in the client initialisation
CS = storage.Client()

lcl_file_name = "/tmp/my-local-file.csv"

tgt_bucket_name = "my-bucket-name"
tgt_object_name = "prefix/another-prefix/my-target-file.csv"

# =====> ==============================
# =====> ==============================
# =====> the process strats here

# https://googleapis.dev/python/storage/latest/_modules/google/cloud/storage/client.html#Client.lookup_bucket
gcs_tgt_bucket_ref = CS.lookup_bucket(tgt_bucket_name)

# check if the target bucket does exist
if gcs_tgt_bucket_ref is None:
    # handle incorrect bucket name or its absence
    # most likely we are to finish the execution here rather than 'pass'
    pass

# calculate the hash for the local file
lcl_crc32c = _crc32sum(lcl_file_name)
base64_crc32c = base64.b64encode(lcl_crc32c.to_bytes(
    length=4, byteorder='big')).decode('utf-8')

# check if the file/object in the bucket already exists
# https://googleapis.dev/python/storage/latest/_modules/google/cloud/storage/bucket.html#Bucket.blob
gcs_file_ref = gcs_tgt_bucket_ref.blob(tgt_object_name)

# https://googleapis.dev/python/storage/latest/_modules/google/cloud/storage/blob.html#Blob.exists
if gcs_file_ref.exists():
    gcs_file_ref.reload()
    # compare crc32c hashes - between the local file and the gcs file/object
    if base64_crc32c != gcs_file_ref.crc32c:
        # the blob file/object in the GCS has a different hash
        # the blob file/object should be deleted and a new one to be uploaded
        # https://googleapis.dev/python/storage/latest/_modules/google/cloud/storage/blob.html#Blob.delete
        gcs_file_ref.delete()
    else:
        # the file/object is already in the bucket
        # most likely we are to finish the execution here rather than 'pass'
        pass

# upload file to the target bucket
# reinit the reference in case the target file/object was deleted
gcs_file_ref = gcs_tgt_bucket_ref.blob(tgt_file_name)
gcs_file_ref.crc32c = base64_crc32c

with open(lcl_file_name, 'rb') as file_obj:
    try:
        gcs_file_ref.metadata = {
            "custom-metadata-key": "custom-metadata-value"
        }
        # https://googleapis.dev/python/storage/latest/_modules/google/cloud/storage/blob.html#Blob.upload_from_file
        gcs_file_ref.upload_from_file(
            file_obj=file_obj, content_type="text/csv", checksum="crc32c")
    except exceptions.GoogleCloudError as gc_err:
        # handle the exception here
        # don't forget to delete the local file if it is not required anymore
        # most likely we are to finish the execution here rather than 'pass'
        pass

# clean behind
if lcl_file_name and os.path.exists(lcl_file_name):
    os.remove(lcl_file_name)

# =====> the process ends here
# =====> ==============================

如果有重大错误请告诉我,我修改示例.

Let me know if there are significant mistakes, and I modify the example.

这篇关于将文件从/tmp 文件夹移动到 Google Cloud Storage 存储分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆