.Net和Hadoop的 - 怎么知道/了解,什么是可用? [英] .Net and Hadoop - What to know / learn and what is available?

查看:129
本文介绍了.Net和Hadoop的 - 怎么知道/了解,什么是可用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是.NET有关BigData。 BigData用于存储和查询数据的巨大ammounts(Facebook,谷歌,Twitter的,...)。 BigData的例子是马preduce,Hadoop的,树妖,...

My question is regarding BigData in .Net. BigData is used to store and query huge ammounts of data (Facebook, Google, Twitter, ...). Examples of BigData are MapReduce, Hadoop, Dryad, ...

微软放弃了他们的树妖(DryadLinq)替代赞成的Hadoop(树妖和<一href="http://www.zdnet.com/blog/microsoft/microsoft-drops-dryad-puts-its-big-data-bets-on-hadoop/11226">the文章),所以我想ppare自己$ P $为它和一切有什么关系呢。

Microsoft dropped their Dryad (DryadLinq) alternative in favor of Hadoop (Dryad and the article), so i'd like to prepare myself for it and everything that has to do with it.

现在什么可以?

Hadoop的连接器

<一个href="https://profile.microsoft.com/RegSysProfileCenter/wizard.aspx?wizid=29ad67d7-2f89-4fbc-9f95-8069f2d34723&lcid=1033">SQL Server 2012的RC(不要在生产中不使用:))

<一个href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data-solution.aspx">Microsoft大数据信息

哪知道更多关于发布和发展?

在科技preVIEW注册

问题1 : 我应该怎么知道的Hadoop是不是唯一的.NET平台? (如何查询,特定的模式,建筑,...),将是有用的(在.NET环境中)

Question 1 : What should i know about Hadoop that isn't unique to the .Net platform? (how to query, specific patterns, architecture, ...) and will be usefull (in a .net environment)

问2 : 是否有对Hadoop的,在.Net平台的更多信息,比我已经知道了?

Question 2 : Is there more information on the Hadoop in the .Net platform, than i already know?

推荐答案

这是一个模糊的问题所以这里有一个模糊的回答:)

it's a vague question so here's a vague answer :)

Hadoop的自己是运行的map-reduce作业在集群中的一个工具,它的高度优化性能和良好的交易这种优化是通过的方式,可以很容易地消耗而不会产生对分配数据进行I / O处罚。

Hadoop on its own is a tool to run map-reduce jobs in a cluster, it's highly optimized for performance and a good deal of this optimization is done by distributing the data in a way that makes it easy to consume without incurring on I/O penalties.

对于这一点,你应该阅读有关 HDFS 并解释这是怎么做的,简而言之的内部什么情况是,输入的数据是成群一起在节点上本地运行的过程和顺序读取(这是HDFS的一个属性/限制)。

for this you should read about HDFS and the internals that explain how is this done, in a nutshell what happens is that the input data is clumped together in nodes to run the processes locally and read sequentially (this is a property/limitation of HDFS).

这样你输入你的BigData,它被分割和加工集群内的最有效的方式。

this way you input your "BigData" and it gets split and processed in the most efficient way inside the cluster.

现在的一切就是Hadoop的本身,还有在它上面,让您对数据进行高层次的抽象,工作的工具(图-减少是其中最简单的程序)。

now that' all there is to Hadoop itself, there's tools that work on top of it that allow you to perform high-level abstractions on the data (map-reduce is among the simplest procedures).

这些包括:

  • Pig http://pig.apache.org/ which is a language to work with the map-reduce process and construct more complex operations
  • Hive http://hive.apache.org/ similar to the previous but more SQL-oriented
  • Cascading http://www.cascading.org/ yet another, more focused on data flow than queries
  • Cascalog https://github.com/nathanmarz/cascalog based on Cascading, written in Clojure
  • HBase http://hbase.apache.org/ a type of NoSQL database on top of HDFS
  • ElephantDB https://github.com/nathanmarz/elephantdb another NoSQL database for Hadoop

具体细节对于.NET

有关的Hadoop在Azure上(.NET),还有在MSDN <一介绍href="http://channel9.msdn.com/Events/windowsazure/learn/Learn-about-Hadoop-on-Windows-Azure-with-Alex-Stojanovic">here更多的<一个href="http://blogs.msdn.com/b/sqlcat/archive/2011/12/14/helping-to-make-hadoop-easier-by-going-metro.aspx">info这里。有关通过自己的平台上构建的Hadoop应用程序。 这只是CTP现在不过关,当然,这种情况将会改变。

For Hadoop on Azure (.Net) , there's an introduction on msdn here with more info here. Related to building Hadoop applications through their platform. It's only CTP for now, but off course this will change.

下面是关于另一个很好的博文<一href="http://blogs.msdn.com/b/carlnol/archive/2011/12/16/hadoop-streaming-and-f-ma$p$pduce.aspx">Hadoop马preduce 与 code

Here's another good blogpost about Hadoop and MapReduce with code

此外,还有一个公司经常提供有关Hadoop的信息: Cloudera公司,你应该检查有经常以获取更多信息。 欲了解更多信息,请检查上面链接了Cloudera的页面,你可以查看所有的概念有关的Hadoop(它的pretty的先进虽然)

Additionally, there's also a company that frequently gives information about Hadoop: Cloudera, you should check there frequently for more information. For more information, check the cloudera page linked above and you can view all the concepts about Hadoop (it's pretty advanced though)

我是pretty的肯定,这是不是你要找的人,但我不知道你想什么那么至少我希望你可以检查一些新的项目,可能会有所帮助。

I'm pretty sure this isn't what you were looking for but I've no idea what you want so at least I hope you can check a few new projects that may help.

同时检查风暴: https://github.com/nathanmarz/storm 它不涉及到Hadoop的,但工程上的实时场景其中的Hadoop不适合

also check Storm: https://github.com/nathanmarz/storm it's not related to Hadoop but works on realtime scenarios which Hadoop is not suited for.

这篇关于.Net和Hadoop的 - 怎么知道/了解,什么是可用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆