将数据从 Azure Blob 存储复制到 AWS S3 [英] Copy Data From Azure Blob Storage to AWS S3

查看:33
本文介绍了将数据从 Azure Blob 存储复制到 AWS S3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Azure 数据工厂的新手,有一个有趣的需求.

I am new to Azure Data Factory and have an interesting requirement.

我需要将文件从 Azure Blob 存储移动到 Amazon S3,最好使用 Azure 数据工厂.

I need to move files from Azure Blob storage to Amazon S3, ideally using Azure Data Factory.

但是 S3 不支持作为接收器;

However S3 isnt supported as a sink;

https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview

我还从我在此处阅读的各种评论中了解到,您不能直接从 Blob 存储复制到 S3 - 您需要在本地下载文件,然后将其上传到 S3.

I also understand from a variety of comments i've read on here that you cannot directly copy from Blob Storage to S3 - you would need to download the file locally and then upload it to S3.

有谁知道数据工厂、SSIS 或 Azure Runbook 中可以执行此类操作的任何示例,我想一个选项是编写从数据工厂调用的 azure 逻辑应用程序或函数.

Does anyone know of any examples, in Data factory, SSIS or Azure Runbook that can do such a thing, I suppose an option would be to write an azure logic-app or function that is called from Data Factory.

推荐答案

设法在这方面取得了一些进展 - 它可能对其他人有用.

Managed to get something working on this - it might be useful for someone else.

我决定编写一个使用 HTTP 请求作为触发器的 azure 函数.

I decided to write an azure function that uses a HTTP request as a trigger.

这两篇文章对我帮助很大;

These two posts helped me a lot;

如何使用 NuGet我的 Azure Functions 中的包?

使用 C# 从 Azure Blob 复制到 AWS S3

如果您使用的是 Azure 函数 2.x,请注意我对 Nuget 包的回答.

Please note my answer to the Nuget packages if you are using Azure functions 2.x.

这是代码 - 您可以根据需要修改其基础.我返回一个 JSON 序列化对象,因为 Azure 数据工厂需要它作为从管道发送的 http 请求的响应;

Here is the code - you can modify the basis of this to your needs. I return a JSON Serialized object because Azure Data Factory requires this as a response from a http request sent from a pipeline;

#r "Microsoft.WindowsAzure.Storage"
#r "Newtonsoft.Json"
#r "System.Net.Http"

using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Net.Http;
using Amazon.S3; 
using Amazon.S3.Model;
using Amazon.S3.Transfer;
using Amazon.S3.Util;


public static async  Task<IActionResult> Run(HttpRequest req, ILogger log)
{
    log.LogInformation("Example Function has recieved a HTTP Request");

    // get Params from query string
    string blobUri = req.Query["blobUri"];
    string bucketName = req.Query["bucketName"];

    // Validate query string
    if (String.IsNullOrEmpty(blobUri) || String.IsNullOrEmpty(bucketName)) {

        Result outcome = new Result("Invalid Parameters Passed to Function",false,"blobUri or bucketName is null or empty");
        return new BadRequestObjectResult(outcome.ConvertResultToJson());
    }

    // cast the blob to its type
    Uri blobAbsoluteUri = new Uri(blobUri);
    CloudBlockBlob blob = new CloudBlockBlob(blobAbsoluteUri);

    // Do the Copy
    bool resultBool = await CopyBlob(blob, bucketName, log);

    if (resultBool) { 
        Result outcome = new Result("Copy Completed",true,"Blob: " + blobUri + " Copied to Bucket: " + bucketName);
        return (ActionResult)new OkObjectResult(outcome.ConvertResultToJson());       
    }
    else {
        Result outcome = new Result("ERROR",false,"Copy was not successful Please review Application Logs");
        return new BadRequestObjectResult(outcome.ConvertResultToJson()); 
    }  
}

static async Task<bool> CopyBlob(CloudBlockBlob blob, string existingBucket, ILogger log) {

        var accessKey = "myAwsKey";
        var secretKey = "myAwsSecret";
        var keyName = blob.Name;

        // Make the client 
        AmazonS3Client myClient = new AmazonS3Client(accessKey, secretKey, Amazon.RegionEndpoint.EUWest1);

        // Check the Target Bucket Exists; 
        bool bucketExists = await AmazonS3Util.DoesS3BucketExistAsync (myClient,existingBucket);

        if (!bucketExists) {
            log.LogInformation("Bucket: " + existingBucket + " does not exist or is inaccessible to the application");
            return false;
        }

        // Set up the Transfer Utility
        TransferUtility fileTransferUtility = new TransferUtility(myClient);

        // Stream the file
        try {

            log.LogInformation("Starting Copy");

            using (var stream = await blob.OpenReadAsync()) {

                // Note: You need permissions to not be private on the source blob
                log.LogInformation("Streaming");

                await fileTransferUtility.UploadAsync(stream,existingBucket,keyName);

                log.LogInformation("Streaming Done");   
            }

            log.LogInformation("Copy completed");
        }
        catch (AmazonS3Exception e) {
                log.LogInformation("Error encountered on server. Message:'{0}' when writing an object", e.Message);
            }
        catch (Exception e) {
                log.LogInformation("Unknown encountered on server. Message:'{0}' when writing an object", e.Message);
                return false;
        }

        return true; 
    }

public class Result {

    public string result;
    public bool outcome;
    public string UTCtime;
    public string details; 

    public Result(string msg, bool outcomeBool, string fullMsg){
        result=msg;
        UTCtime=DateTime.Now.ToString("yyyy-MM-dd h:mm:ss tt");
        outcome=outcomeBool;
        details=fullMsg;
    }

    public string ConvertResultToJson() {
        return JsonConvert.SerializeObject(this);
    } 
}

这篇关于将数据从 Azure Blob 存储复制到 AWS S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆