Apache Spark与Apache Spark 2 [英] Apache Spark vs Apache Spark 2

查看:102
本文介绍了Apache Spark与Apache Spark 2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与Apache Spark相比,Apache Spark2带来了哪些改进?

What are the improvements Apache Spark2 brings compared to Apache Spark?

  1. 从架构的角度来看
  2. 从应用程序角度看
  3. 或更多

推荐答案

Apache Spark 2.0.0 API基本上类似于1.X,Spark 2.0.0确实有API重大更改

Apache Spark 2.0.0 APIs have stayed largely similar to 1.X, Spark 2.0.0 does have API breaking changes

Apache Spark 2.0.0 是2.x行中的第一个发行版.主要更新包括 API可用性,SQL 2003支持,性能改进,结构化流,R UDF支持以及操作上的改进.

Apache Spark 2.0.0 is the first release on the 2.x line. The major updates are API usability, SQL 2003 support, performance improvements, structured streaming, R UDF support, as well as operational improvements.

火花2的新功能

New in spark 2:

  • 我可以看到的最大变化是,DataSet和DataFrame API将被合并.
  • 与以前的版本相比,Spark的最新和最出色的功能将大大提高效率. Spark 2.0将专注于Parquet和缓存的结合,以实现更好的吞吐量.
  • 结构化的流媒体是另外一件大事!
  • 这将是第一个侧重于ETL的版本.后续版本将为ETL添加更多运算符和库
    • The biggest change that I can see is that DataSet and DataFrame APIs will be merged.
    • The latest and greatest from Spark will be a whole lot efficient as compared to predecessors. Spark 2.0 is going to focus on a combination of Parquet and caching to achieve even better throughput.
    • Structured streaming is another big thing!
    • It will be the first version that will focus on ETL. Successive versions will add more operators and libraries for ETL
    • 您可以浏览 Spark版本2.0.0 解释以下几点的更新:

      You can go through the Spark release 2.0.0 where updates in following points are explained:

      • API稳定性
      • Core和Spark SQL
      • MLlib
      • SparkR
      • 流式传输
      • 依赖性,打包和操作
      • 删除,行为更改和弃用
      • 已知问题

      这篇关于Apache Spark与Apache Spark 2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆