如何在有或没有 Pig 的情况下使用 Cassandra 的 Map Reduce? [英] How to use Cassandra's Map Reduce with or w/o Pig?

查看:32
本文介绍了如何在有或没有 Pig 的情况下使用 Cassandra 的 Map Reduce?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以解释 MapReduce 如何与 Cassandra .6 一起工作吗?我已经阅读了字数统计示例,但我不太了解 Cassandra 端与客户端"端发生的情况.

Can someone explain how MapReduce works with Cassandra .6? I've read through the word count example, but I don't quite follow what's happening on the Cassandra end vs. the "client" end.

https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/

例如,假设我正在使用 Python 和 Pycassa,我将如何加载一个新的 map reduce 函数,然后调用它?我的 map reduce 功能必须是安装在 cassandra 服务器上的 java 吗?如果是这样,我如何从 Pycassa 调用它?

For instance, let's say I'm using Python and Pycassa, how would I load in a new map reduce function, and then call it? Does my map reduce function have to be java that's installed on the cassandra server? If so, how do I call it from Pycassa?

还有人提到 Pig 让这一切变得更容易,但我是一个完整的 Hadoop 菜鸟,所以这并没有真正帮助.

There's also mention of Pig making this all easier, but I'm a complete Hadoop noob, so that didn't really help.

您的答案可以使用 Thrift 或其他什么,我刚刚提到 Pycassa 来表示客户端.我只是想了解 Cassandra 集群中运行的内容与发出请求的实际服务器之间的区别.

Your answer can use Thrift or whatever, I just mentioned Pycassa to denote the client side. I'm just trying to understand the difference between what runs in the Cassandra cluster vs. the actual server making the requests.

推荐答案

据我所知(以及来自 这里),开发者编写使用Cassandra作为数据源的MapReduce程序的方式如下.您编写了一个常规 MapReduce 程序(您链接到的示例是纯 Java 版本)并且现在可用的 jar 提供了一个 CustomInputFormat,它允许输入源是 Cassandra(而不是默认的 Hadoop).

From what I've heard (and from here), the way that a developer writes a MapReduce program that uses Cassandra as the data source is as follows. You write a regular MapReduce program (the example you linked to is for the pure-Java version) and the jars that are now available provide a CustomInputFormat that allows the input source to be Cassandra (instead of the default, which is Hadoop).

如果您使用的是 Pycassa,我会说您很不走运,直到 (1) 该项目的维护者添加对 MapReduce 的支持或 (2) 您将一些 Python 函数放在一起编写 Java MapReduce 程序并运行它.后者绝对是一个小技巧,但会让你振作起来.

If you're using Pycassa I'd say you're out of luck until either (1) the maintainer of that project adds support for MapReduce or (2) you throw some Python functions together that write up a Java MapReduce program and run it. The latter is definitely a bit of a hack but would get you up and going.

这篇关于如何在有或没有 Pig 的情况下使用 Cassandra 的 Map Reduce?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆