Google Cloud Dataflow和Google Cloud Dataproc有什么区别? [英] What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

查看:238
本文介绍了Google Cloud Dataflow和Google Cloud Dataproc有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Google Data Flow实施ETL数据仓库解决方案.

I am using Google Data Flow to implement an ETL data ware house solution.

看看Google Cloud产品,DataProc似乎也可以做同样的事情.

Looking into google cloud offering, it seems DataProc can also do the same thing.

似乎DataProc比DataFlow便宜一点.

It also seems DataProc is little bit cheaper than DataFlow.

有人知道DataFlow优于DataProc的优缺点

Does anybody know the pros / cons of DataFlow over DataProc

Google为什么同时提供两者?

Why does google offer both?

推荐答案

是的,Cloud Dataflow和Cloud Dataproc都可以用于实现ETL数据仓库解决方案.

Yes, Cloud Dataflow and Cloud Dataproc can both be used to implement ETL data warehousing solutions.

可以在Google 云平台大数据解决方案文章

An overview of why each of these products exist can be found in the Google Cloud Platform Big Data Solutions Articles

简要介绍:

  • Cloud Dataproc在GCP上为您提供了一个Hadoop集群,并可以访问Hadoop生态系统工具(例如Apache Pig,Hive和Spark);如果您已经熟悉Hadoop工具并拥有Hadoop工作,那么这将具有很强的吸引力
  • Cloud Dataflow为您提供了一个在GCP上运行基于 Apache Beam 的作业的场所,而您却没有需要解决在群集上运行作业的常见方面(例如,平衡工作或缩放作业的工作人员数量;默认情况下,这是自动为您管理的,并且适用于批处理和流式传输)-这可能非常在其他系统上耗时
    • Apache Beam是一个重要的考虑因素; Beam作业旨在在包括Cloud Dataflow在内的运行程序"之间可移植,并使您能够专注于逻辑计算,而不是运行程序"的工作方式-相比之下,编写Spark作业时,代码是绑定的跑步者,Spark以及跑步者的工作方式
    • Cloud Dataflow还提供了基于模板"创建作业的功能,这可以帮助简化其中区别在于参数值的常见任务
    • Cloud Dataproc provides you with a Hadoop cluster, on GCP, and access to Hadoop-ecosystem tools (e.g. Apache Pig, Hive, and Spark); this has strong appeal if you are already familiar with Hadoop tools and have Hadoop jobs
    • Cloud Dataflow provides you with a place to run Apache Beam based jobs, on GCP, and you do not need to address common aspects of running jobs on a cluster (e.g. Balancing work, or Scaling the number of workers for a job; by default, this is automatically managed for you, and applies to both batch and streaming) -- this can be very time consuming on other systems
      • Apache Beam is an important consideration; Beam jobs are intended to be portable across "runners," which include Cloud Dataflow, and enable you to focus on your logical computation, rather than how a "runner" works -- In comparison, when authoring a Spark job, your code is bound to the runner, Spark, and how that runner works
      • Cloud Dataflow also offers the ability to create jobs based on "templates," which can help simplify common tasks where the differences are parameter values

      这篇关于Google Cloud Dataflow和Google Cloud Dataproc有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆