.NET和Hadoop-我应该了解/了解什么以及可以得到什么? [英] .NET and Hadoop - What should I know / learn and what is available?

查看:65
本文介绍了.NET和Hadoop-我应该了解/了解什么以及可以得到什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于.NET中的BigData. BigData用于存储和查询大量数据(Facebook,Google,Twitter等). BigData的示例包括MapReduce,Hadoop,Dryad等.

My question is regarding BigData in .NET. BigData is used to store and query huge amounts of data (Facebook, Google, Twitter, ...). Examples of BigData are MapReduce, Hadoop, Dryad, etc.

Microsoft放弃了Dryad(DryadLinq)替代方案,转而支持Hadoop( Dryad 文章),所以我想为此做好准备,并做好与之相关的一切工作.

Microsoft dropped their Dryad (DryadLinq) alternative in favor of Hadoop (Dryad and the article), so I'd like to prepare myself for it and everything that has to do with it.

现在有什么?

Hadoop连接器

SQL Server 2012 RC(请勿在生产中使用:))

有关Microsoft的信息大数据

我应该对发布和开发有更多的了解吗?

在TechPreview上注册

问题1 : 对于.NET平台不是唯一的Hadoop,我应该知道些什么? (如何查询,特定的模式,体系结构...),并且将在(.NET环境中)有用

Question 1: What should I know about Hadoop that isn't unique to the .NET platform? (how to query, specific patterns, architecture, ...) and will be usefull (in a .NET environment)

问题2 : .NET平台上有关Hadoop的信息是否比我所知道的还要多?

Question 2: Is there more information on the Hadoop in the .NET platform, than I already know?

推荐答案

这是一个模糊的问题,所以这是一个模糊的答案:)

it's a vague question so here's a vague answer :)

Hadoop本身是一种在集群中运行映射减少作业的工具,它针对性能进行了高度优化,并且通过以一种易于使用而又不会造成数据消耗的方式分配数据来完成大量此类优化工作. I/O处罚.

Hadoop on its own is a tool to run map-reduce jobs in a cluster, it's highly optimized for performance and a good deal of this optimization is done by distributing the data in a way that makes it easy to consume without incurring on I/O penalties.

为此,您应该简要了解 HDFS 以及解释此操作方式的内部原理发生的情况是输入数据在节点中聚集在一起,以在本地运行进程并顺序读取(这是HDFS的属性/局限性).

for this you should read about HDFS and the internals that explain how is this done, in a nutshell what happens is that the input data is clumped together in nodes to run the processes locally and read sequentially (this is a property/limitation of HDFS).

通过这种方式输入"BigData",即可在集群内部以最有效的方式对它进行拆分和处理.

this way you input your "BigData" and it gets split and processed in the most efficient way inside the cluster.

既然Hadoop本身就拥有了一切,那么有很多工具可以在其中运行,这些工具使您可以对数据执行高级抽象(map-reduce是最简单的过程之一).

now that' all there is to Hadoop itself, there's tools that work on top of it that allow you to perform high-level abstractions on the data (map-reduce is among the simplest procedures).

其中包括:

  • Pig http://pig.apache.org/ which is a language to work with the map-reduce process and construct more complex operations
  • Hive http://hive.apache.org/ similar to the previous but more SQL-oriented
  • Cascading http://www.cascading.org/ yet another, more focused on data flow than queries
  • Cascalog https://github.com/nathanmarz/cascalog based on Cascading, written in Clojure
  • HBase http://hbase.apache.org/ a type of NoSQL database on top of HDFS
  • ElephantDB https://github.com/nathanmarz/elephantdb another NoSQL database for Hadoop

.Net的规范

对于Azure(.Net)上的Hadoop,在msdn 此处的信息.与通过其平台构建Hadoop应用程序有关. 目前只是CTP,但是当然这会改变.

For Hadoop on Azure (.Net) , there's an introduction on msdn here with more info here. Related to building Hadoop applications through their platform. It's only CTP for now, but off course this will change.

这是关于 Hadoop和带有代码

此外,还有一家公司经常提供有关Hadoop的信息:

Additionally, there's also a company that frequently gives information about Hadoop: Cloudera, you should check there frequently for more information. For more information, check the cloudera page linked above and you can view all the concepts about Hadoop (it's pretty advanced though)

我很确定这不是您想要的,但我不知道您想要什么,因此至少希望您可以检查一些可能有帮助的新项目.

I'm pretty sure this isn't what you were looking for but I've no idea what you want so at least I hope you can check a few new projects that may help.

还要检查Storm: https://github.com/nathanmarz/storm 它与Hadoop不相关,但可以在实时场景下使用哪种Hadoop不适合.

also check Storm: https://github.com/nathanmarz/storm it's not related to Hadoop but works on realtime scenarios which Hadoop is not suited for.

这篇关于.NET和Hadoop-我应该了解/了解什么以及可以得到什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆