如何使用 Cassandra 的 Map Reduce 与 Pig 或不使用 Pig? [英] How to use Cassandra's Map Reduce with or w/o Pig?

查看:25
本文介绍了如何使用 Cassandra 的 Map Reduce 与 Pig 或不使用 Pig?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释一下 MapReduce 如何与 Cassandra .6 一起工作?我已经通读了字数统计示例,但我不太了解 Cassandra 端与客户端"端的情况.

Can someone explain how MapReduce works with Cassandra .6? I've read through the word count example, but I don't quite follow what's happening on the Cassandra end vs. the "client" end.

https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/

例如,假设我使用 Python 和 Pycassa,我将如何加载新的 map reduce 函数,然后调用它?我的 map reduce 函数是否必须是安装在 cassandra 服务器上的 java?如果是这样,我如何从 Pycassa 调用它?

For instance, let's say I'm using Python and Pycassa, how would I load in a new map reduce function, and then call it? Does my map reduce function have to be java that's installed on the cassandra server? If so, how do I call it from Pycassa?

还有人提到 Pig 让这一切变得更容易,但我是一个完整的 Hadoop 菜鸟,所以这并没有真正帮助.

There's also mention of Pig making this all easier, but I'm a complete Hadoop noob, so that didn't really help.

您的答案可以使用 Thrift 或其他什么,我刚刚提到 Pycassa 来表示客户端.我只是想了解 Cassandra 集群中运行的内容与发出请求的实际服务器之间的区别.

Your answer can use Thrift or whatever, I just mentioned Pycassa to denote the client side. I'm just trying to understand the difference between what runs in the Cassandra cluster vs. the actual server making the requests.

推荐答案

据我所知(以及来自 这里),开发者编写一个使用Cassandra作为数据源的MapReduce程序的方式如下.您编写了一个常规的 MapReduce 程序(您链接到的示例适用于纯 Java 版本),现在可用的 jar 提供了一个 CustomInputFormat,允许输入源为 Cassandra(而不是默认的 Hadoop).

From what I've heard (and from here), the way that a developer writes a MapReduce program that uses Cassandra as the data source is as follows. You write a regular MapReduce program (the example you linked to is for the pure-Java version) and the jars that are now available provide a CustomInputFormat that allows the input source to be Cassandra (instead of the default, which is Hadoop).

如果您使用的是 Pycassa,我会说您不走运,直到 (1) 该项目的维护者添加对 MapReduce 的支持或 (2) 您将一些 Python 函数放在一起编写 Java MapReduce 程序并运行它.后者绝对是一个黑客,但会让你起床.

If you're using Pycassa I'd say you're out of luck until either (1) the maintainer of that project adds support for MapReduce or (2) you throw some Python functions together that write up a Java MapReduce program and run it. The latter is definitely a bit of a hack but would get you up and going.

这篇关于如何使用 Cassandra 的 Map Reduce 与 Pig 或不使用 Pig?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆