猪群中地图的组合键值 [英] Group key value of map in pig

查看:215
本文介绍了猪群中地图的组合键值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新的猪脚。说,我们有一个文件

  [a#1,b#2,c#3] 
[ 4,b#5,c#6]
[a#7,b#8,c#9]

猪脚本

  A = LOAD'txt'AS(in:map []); 
B = FOREACH A GENERATE in#'a';
DUMP B

我们知道我们可以把值放在钥匙中。在上面的例子中,我拿到了包含关键字a的值的地图。
假设我不知道密钥,我想对关系中的密钥进行分组,并将其转储。

 (a,{1,4,7})
(b,{2,5,8})
(c,{3,6,9})

猪是否允许这样的操作或需要使用UDF?请帮我通过这个。谢谢。

解决方案

您可以创建一个自定义的 UDF 到一个包(使用Pig v0.10.0):

  package com.example; 

import java.io.IOException;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;

public class MapToBag扩展EvalFunc< DataBag> {

private static final BagFactory bagFactory = BagFactory.getInstance();
private static final TupleFactory tupleFactory = TupleFactory.getInstance();

@Override
public DataBag exec(Tuple input)throws IOException {
try {
@SuppressWarnings(unchecked)
Map< String,Object> ; map =(Map< String,Object>)input.get(0);
DataBag result = null;
if(map!= null){
result = bagFactory.newDefaultBag(); $($条目< String,Object>条目:map.entrySet())
{
元组tuple = tupleFactory.newTuple(2);
tuple.set(0,entry.getKey());
tuple.set(1,entry.getValue());
result.add(tuple);
}
}
返回结果;

}
catch(异常e){
抛出新的RuntimeException(MapToBag错误,e);
}
}
}

然后:

  B = foreach A生成
flatten(com.example.MapToBag(in))as(k:chararray,v:chararray) ;
描述B;
B:{k:chararray,v:chararray}

现在按键分组一个嵌套的foreach:

  C = foreach(B组由k){
value = foreach B generate v;
生成组作为键,值;
};
dump C;
(a,{(1),(4),(7)})
(b,{(2),(5),(8)})
(c, (3),(6),(9)})


I am new to pigscript. Say, We have a file

[a#1,b#2,c#3]
[a#4,b#5,c#6]
[a#7,b#8,c#9]

pig script

A = LOAD 'txt' AS (in: map[]);
B = FOREACH A GENERATE in#'a';
DUMP B;

We know that we can take the values feeding in the key. In the above example I took the map that contains the values with respect to the key "a". Assuming that I dont know the key, I want to group the values with respect to keys in a relation and dump it.

(a,{1,4,7})
(b,{2,5,8})
(c,{3,6,9})    

Does pig allows such operations or need to go with UDF? Please help me through this. Thanks.

解决方案

You can create a custom UDF which converts the map to a bag (using Pig v0.10.0):

package com.example;

import java.io.IOException;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;

public class MapToBag extends EvalFunc<DataBag> {

    private static final BagFactory bagFactory = BagFactory.getInstance();
    private static final TupleFactory tupleFactory = TupleFactory.getInstance();

    @Override
    public DataBag exec(Tuple input) throws IOException {
        try {
            @SuppressWarnings("unchecked")
            Map<String, Object> map = (Map<String, Object>) input.get(0);
            DataBag result = null;
            if (map != null) {
                result = bagFactory.newDefaultBag();
                for (Entry<String, Object> entry : map.entrySet()) {
                    Tuple tuple = tupleFactory.newTuple(2);
                    tuple.set(0, entry.getKey());
                    tuple.set(1, entry.getValue());
                    result.add(tuple);
                }
            }
            return result;

        }
        catch (Exception e) {
            throw new RuntimeException("MapToBag error", e);
        }
    }
}

Then:

B = foreach A generate 
      flatten(com.example.MapToBag(in)) as (k:chararray, v:chararray);
describe B;
B: {k: chararray,v: chararray}

Now group by key and use a nested foreach:

C = foreach (group B by k) {
    value = foreach B generate v;
    generate group as key, value;
};
dump C;
(a,{(1),(4),(7)})
(b,{(2),(5),(8)})
(c,{(3),(6),(9)})

这篇关于猪群中地图的组合键值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆