一般Azure ETL问题 [英] General Azure ETL questions

查看:424
本文介绍了一般Azure ETL问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,



我目前有许多Windows计划任务执行python代码来解析文本文件并将其加载到数据库。 随着我们公司更多地转向Azure,我现在正在寻求将一些ETL流程迁移到Azure服务。 到目前为止,这是一个
陡峭的学习曲线,试图理解许多很多部分和服务,我对ADF,工作流程和数据库有一个非常基本的了解。 我似乎迷失在我需要的特定实现方案中,并且一些帮助
对于除了非常基本的场景之外的任何细节都有点"轻松"。



例如,我目前有一个Web应用程序,它创建一个带有日期后缀的文件夹,并将多个文件上传到Azure Blob存储。 我想触发blob创建事件来调用一些python来对上传的txt文件进行一些清理,方法是删除
一些特定的行号,然后将这些数据加载到SQL数据库中。 b $ b

我不确定我是否应该构建一个数据流来处理所有这些,或者是一个带有python Databrick的简单管道,或者是否应该使用事件网格和订阅来完成?b


我知道有很多不同的方法可以用我提到的任何或所有选项来实现上述目标,但是对于完成这样的事情的任何建议/意见都将是非常感谢。



Warren M

Hello,

I currently have a number of Windows Scheduled Tasks that execute python code to parse and load text files to a database.  With our company moving more to Azure, I'm now looking to migrate some of the ETL processes to Azure services.  It's been a steep learning curve thus far trying to understand the many, many pieces and services, and I have a very basic understanding of ADF, Workflows and Databricks.  I seem to be getting lost on specific implementation scenarios I require and some of the help is a tad 'light' on details for anything other than very basic scenarios.

For example, I currently have a web app that creates a folder with a date suffix and uploads multiple files to an Azure Blob Storage.  I would like to trigger the blob creation event to call some python to do some cleanup on the uploaded txt files by deleting some specific lines by line number, and then load this data into a SQL database.

I'm not sure if I should be building a Data Flow to handle all of this, or a simple Pipeline with a python Databrick, or if this should be done with Event Grids and Subscriptions?

I know there's probably many different ways to accomplish the above with any or all of the options I mentioned, but any suggestions/comments on accomplishing something like this would be greatly appreciated.


Warren M

推荐答案

Hello Warren,  

Hello Warren , 

这是一个非常广泛的场景。您提到的方案可以使用复制活动来处理。但是因为我们不知道在将它们摄取到SQL DB之前需要对blob进行哪种清洁,我们不确定这个
对你有用,复制活动确实提供了有限的清理选项。您还可以使用管道中的数据块活动。看起来你可以使用的触发器应该是基于事件的触发器。 

This is a very broad scenario . The scenario which you have mentioned can be handled by using a copy activity . But since we do not know what kind of cleaning which needs to be done on the blob before it is ingested to the SQL DB , we are not sure how this will work for you , Copy activity does provide limited cleaning options . You can also use the data brick activity from a pipeline . It looks like the trigger which you can use should be Event-based trigger . 

我们对数据大小一无所知,我们知道Azure数据块可能会变成与ADFV2相比,这是一个昂贵的解决方案,但功能丰富 因为你可以使用Python或Scala。

We do not have the any idea of the data size , we understand that Azure data-brick may turn out to be a expensive solution compared to ADFV2 but it is feature rich  as you can use Python or Scala .


这篇关于一般Azure ETL问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆