星火电子病历：时间在EMR运行数据并没有减少时，没有节点的增加 [英] Spark on EMR : Time for running data in EMR didn't reduce when no of nodes increases

查看：196 发布时间：2016/5/22 16:41:07 amazon-web-services amazon-s3 apache-spark emr

本文介绍了星火电子病历：时间在EMR运行数据并没有减少时，没有节点的增加的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的星火计划需要大量的包含从S3 JSON数据的zip文件。它执行的火花变换形式的数据一些清洁。在那之后，我保存为实木复合地板的文件。当我和1GB的数据10个节点8GB配置在AWS上运行我的程序大约需要11分钟。我改成了20个节点32GB的配置。尽管如此大约需要10分钟。仅减少1分钟左右。为什么这种行为？

My Spark program take a large amount of zip files that contain JSON data from S3. It performs some cleaning on the data in the form of spark transforms. After that, I saved it as parquet files. When I run my program with 1GB data in 10 nodes 8GB configurations in AWS it takes about 11 min. I changed it to 20 nodes 32GB configuration. Still it takes about 10 min. Reduced only around 1 min. Why this kind of behavior?

星火电子病历：时间在EMR运行数据并没有减少时，没有节点的增加 [英] Spark on EMR : Time for running data in EMR didn't reduce when no of nodes increases

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火电子病历：时间在EMR运行数据并没有减少时，没有节点的增加 [英] Spark on EMR : Time for running data in EMR didn&#39;t reduce when no of nodes increases

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

星火电子病历：时间在EMR运行数据并没有减少时，没有节点的增加 [英] Spark on EMR : Time for running data in EMR didn't reduce when no of nodes increases

登录关闭