猪& Cassandra& DataStax拆分控制 [英] Pig & Cassandra & DataStax Splits Control

查看:145
本文介绍了猪& Cassandra& DataStax拆分控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直使用Pig与我的Cassandra数据做各种惊人的专长,几乎不可能写入命令。我使用DataStax的集成Hadoop&卡桑德拉,我不得不说,这是相当令人印象深刻。对那些家伙来说吧!

I have been using Pig with my Cassandra data to do all kinds of amazing feats of groupings that would be almost impossible to write imperatively. I am using DataStax's integration of Hadoop & Cassandra, and I have to say it is quite impressive. Hat-off to those guys!!

我有一个非常小的沙箱集群(2节点),我通过一些测试让这个系统。我有一个CQL表有约53M行(约350字节的ea。),我注意到,Mapper以后需要很长的时间磨通过这53M行。我开始在日志周围戳,我可以看到地图重复地溢出(我看到从地图程序177溢出),我认为这是问题的一部分。

I have a pretty small sandbox cluster (2-nodes) where I am putting this system thru some tests. I have a CQL table that has ~53M rows (about 350 bytes ea.), and I notice that the Mapper later takes a very long time to grind thru these 53M rows. I started poking around the logs and I can see that the map is spilling repeatedly (i saw 177 spills from the mapper), and I think this is part of the problem.

CassandraInputFormat和JobConfig的组合仅创建一个映射程序,因此此映射程序必须从表中读取100%的行。我叫这个反并行:)

The combination of CassandraInputFormat and JobConfig only create a single mapper, so this mapper has to read 100% of the rows from the table. I call this anti-parallel :)

现在,这张图片中有很多齿轮,包括:

Now, there are a lot of gears at work in this picture, including:


  • 2个物理节点

  • hadoop节点位于

任何人都可以指向我。方向如何让Pig创建更多输入拆分,以便我可以运行更多的mappers?我有23个槽;似乎很可惜只能一直使用一个。

Can anybody point me in the direction of how to get Pig to create more Input Splits so I can run more mappers? I have 23 slots; seems a pity to only use one all the time.

还是,我完全疯了,不明白的问题?

Or, am I completely mad and don't understand the problem? I welcome both kinds of answers!

推荐答案

您应该设置 pig.noSplitCombination = true

在调用脚本时:

dse pig -Dpig.noSplitCombination=true /path/to/script.pig

Pig脚本本身:

SET pig.noSplitCombination true
table = LOAD 'cfs://ks/cf' USING CqlStorage();

或永久性地位于 /etc/dse/pig/pig.properties 。取消注释:

Or permanently in /etc/dse/pig/pig.properties. Uncomment:

pig.noSplitCombination=true

否则,Pig可以设置你的总输入路径(组合)来处理:1。

Otherwise, Pig may set your total input paths (combined) to process: 1.

这篇关于猪& Cassandra& DataStax拆分控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆