在卡桑德拉/ HDFS和星火运动数据 [英] Data motion in Cassandra/HDFS and Spark

查看：151 发布时间：2016/5/22 16:21:19 hadoop cassandra apache-spark hdfs distributed-computing

本文介绍了在卡桑德拉/ HDFS和星火运动数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在设计一个分布式存储和分析的体系结构，它是一种常见的使用方式，在同一台计算机中的数据节点上运行分析引擎？ 特别是，它将使意义上直接卡桑德拉/ HDFS节点上运行的Spark /风暴？

When designing a distributed storage and analytics architecture, is it a common usage pattern to run an analytics engine on the same machine as the data nodes? Specifically, would it make sense to run Spark/Storm directly on Cassandra/HDFS nodes?

我知道，在HDFS马preduce有这种使用模式，因为<一href=\"http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_using-apache-hadoop/content/yarn_overview.html\"相对=nofollow>根据Hortonworks ，纱最大限度地减少数据移动。我不知道这是否是尽管这些其他系统的情况。我猜想这是因为他们似乎是互相如此可插拔的，但我似乎无法找到这个网上的任何信息。

I know that MapReduce on HDFS has this sort of usage pattern since according to Hortonworks, YARN minimizes data motion. I have no idea whether this is the case with these other systems though. I would imagine it is since they seem to be so pluggable with each other, but I can't seem to find any information about this online.

我是那种对这个主题的新手，所以任何资源或答案会大大AP preciated。

I'm sort of a newbie on this topic, so any resources or answers would be greatly appreciated.

感谢

在卡桑德拉/ HDFS和星火运动数据 [英] Data motion in Cassandra/HDFS and Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在卡桑德拉/ HDFS和星火运动数据 [英] Data motion in Cassandra/HDFS and Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭