使用Azure Data Lake Analytics与传统ETL方法的原因 [英] Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

查看：89 发布时间：2020/9/16 23:58:27 azure azure-data-lake u-sql

本文介绍了使用Azure Data Lake Analytics与传统ETL方法的原因的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

与正在研究多年的传统ETL SSIS方案相比，我正在考虑使用最近研究的Data Lake技术.

I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years.

我认为Data Lake与大数据非常相关，但是使用Data Lake技术与SSIS之间的界线在哪里?

I think of Data Lake as something very linked to big data, but where is the line between using Data Lake technolgies vs SSIS?

使用Data Lake技术处理25MB〜100MB〜300MB文件有什么优势吗?并行性?灵活性?将来可扩展吗? 当要加载的文件没有U-SQL最佳方案那么大时，性能会有所提高吗?

Is there any advantage of using Data Lake technologies with 25MB ~100MB ~ 300MB files? Parallelism? flexibility? Extensible in the future? Is there any performance gain when the files to be loaded are not so big as U-SQL best scenario...

您有什么想法?就像用锤子敲碎螺母一样吗? 请不要犹豫，问我任何问题以澄清情况. 在此先感谢！

What are your thoughts? Would it be like using a hammer to crack a nut? Please, don't hesitate to ask me any questions to clarify the situation. Thanks in advance!!

21/03编辑 更多说明:

必须在云上
我以为可以对某些(基本)转换使用U-SQL，但是我看到了一些问题
- 我无法执行许多基本操作:循环，更新，在SQL中编写日志...
- 输出只能是U-SQL表或文件.这种架构看起来并不好(尽管U-SQL对于大文件非常好，如果我需要额外的步骤将文件导出到另一个DB或DWH)-也许这就是在大数据仓库中完成的方式...我不知道
- 在我的测试中，一个1MB的文件需要40秒钟，而500MB的文件则需要1:15s.我无法证明40 MB的进程占用1MB的空间(再加上使用ADF上传到数据库/数据仓库)
- 对于用户而言，该代码看起来是无组织的，因为具有许多基本验证的脚本将是U-SQL脚本太长.

has to be on the cloud
the reason I considered about using ADL is because there is no substitution for SSIS in the cloud. There is ADF, but it's not the same, it orchestrates the data, but it's not so flexible as SSIS
I thought I could use U-SQL for some (basic) transformations but I see some problems
- There are many basic things I cannot do: loops, updates, writing logs in a SQL...
- The output can only be a U-SQL table or a file. The architecture doesn't look good this way (despite U-SQL is very good with big files, if I need an extra step to export the file to another DB or DWH) - Or maybe this is the way it's done in Big Data Warehouses... I don't know
- In my tests, It takes 40s for a 1MB file, and 1:15s for a 500MB file. I cannot justify a 40s process for 1MB (plus uploading to the Database/Data Warehouse with ADF)
- The code looks unorganised for a user, as the scripts with many basic validations will be U-SQL scripts too long.

不要误会我的意思，我真的很喜欢ADL技术，但是我认为目前它是针对非常具体的东西，仍然无法替代云中的SSIS.你这是什么我错了吗?

Don't get me wrong, I really like ADL techonologies, but I think that for now, it's for something very specific and still there is no substitution for SSIS in the cloud. What do you thing? Am I wrong?

使用Azure Data Lake Analytics与传统ETL方法的原因 [英] Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Azure Data Lake Analytics与传统ETL方法的原因 [英] Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭