如何在Azure数据湖分析和Azure Databricks之间进行选择 [英] How to choose between Azure data lake analytics and Azure Databricks

查看:241
本文介绍了如何在Azure数据湖分析和Azure Databricks之间进行选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Azure数据湖分析和Azure数据块均可用于批处理.谁能帮助我了解何时选择一个?

解决方案

在我的拙见中,其中很多归结于现有的技能.如果您有一支在Spark,Java,Python,r或Scala方面具有丰富经验的团队,那么Databricks是不二之选.另一方面,如果您的团队拥有现有的SQL和c#技能,那么使用U-SQL的学习曲线就不会那么陡峭.

此外,还有其他一些问题可以消除差异:

  • 您需要实时交互(数据块)还是批处理模式分析(两者)?尽管有一个有关U-SQL实时交互的反馈项,但此处.
  • 是否需要本地开发模拟器(U-SQL)? Visual Studio中的U-SQL模拟器是无缝的,即您以与湖相同的结构针对本地驱动器开发代码(免费),然后只需在Visual Studio中单击下拉菜单即可在云端运行.尽管我认为您可以有一个本地Spark环境,但是我不确定Databricks的本地(和断开连接)开发经验是什么.
  • 您是否正在使用ADLS Gen 2(仅Databricks)?请参见此处.

更新2018年10月: 据我所知,U-SQL当前不支持ADLS Gen 2,这很不利于它(很高兴予以纠正).我将更新帖子如果以及添加该支持的时间.

2019年1月更新: 自 2018年春季以来,U-SQL尚未进行任何有意义的更新. >

HTH

Azure data lake analytics and azure databricks both can be used for batch processing. Could anyone please help me understand when to choose one over another?

解决方案

In my humble opinion, a lot of it comes down to existing skillsets. If you have a team experienced in Spark, Java, Python, r or Scala then Databricks is a natural fit. If on the other hand you have a team with existing SQL and c# skills, then the learning curve for them with U-SQL will be less steep.

That aside, there are other questions which can drive out differences:

  • Do you require realtime interaction (Databricks) or batch mode analytics (both)? Although there is a feedback item for real-time interactivity for U-SQL, please vote.
  • Do you want a pay-as-you-go model (U-SQL) or clusters with auto-terminate after a certain period (Databricks)?
  • Do you like working in a notebook (Databricks) or Visual Studio / VSCode / Powershell / .net sdk (U-SQL) method?
  • Do you want to use Spark libraries like GraphX (Databricks)?
  • Do you want the ability to run and scale any runtime (U-SQL)? See here for more details.
  • Do you want a local development emulator (U-SQL)? The U-SQL emulator in Visual Studio is seamless, ie you develop your code against your local drives in the same structure as your lake (for free), then simply click the drop-down in Visual Studio to run in the cloud. Although I think you can have a local Spark environment, I'm not sure what the local (and disconnected) development experience is for Databricks.
  • Are you using ADLS Gen 2 (only Databricks)? See here.

UPDATE October 2018: As far as I am aware, U-SQL does not currently support ADLS Gen 2, which would count against it (happy to be corrected). I will update the post if and when that support is added.

UPDATE January 2019: U-SQL has not had any meaningful updates since Spring 2018.

HTH

这篇关于如何在Azure数据湖分析和Azure Databricks之间进行选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆