如何用Pig把(A,B,C)变成(AB,AC,BC)? [英] How to turn (A, B, C) into (AB, AC, BC) with Pig?

查看:144
本文介绍了如何用Pig把(A,B,C)变成(AB,AC,BC)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Pig中,给定以下袋:(A,B,C),我能以某种方式计算所有值的唯一组合吗? 我要查找的结果是(AB,AC,BC). 我不理会BA,CA和CB,因为如果按字母顺序排序,它们将成为现有值的重复.

In Pig, given the following Bag: (A, B, C), can I somehow calculate the unique combinations of all the values? The result I'm looking for is something like (AB, AC, BC). I'm disregarding BA, CA, CB since they would become duplicates of the existing values if sorted in alphabetic order.

推荐答案

执行此类操作的唯一方法是编写UDF.这将完全满足您的要求:

The only way of doing something like that is writing a UDF. This one will do exactly what you want:

public class CombinationsUDF extends EvalFunc<DataBag> {
    public DataBag exec(Tuple input) throws IOException {
        List<Tuple> bagValues = new ArrayList<Tuple>();
        Iterator<Tuple> iter = ((DataBag)input.get(0)).iterator();
        while (iter.hasNext()) {
            bagValues.add(iter.next());
        }

        List<Tuple> outputTuples = new ArrayList<Tuple>();
        for (int i = 0; i < bagValues.size() - 1; i++) {
            List<Object> currentTupleValues = bagValues.get(i).getAll();

            for (int j = i + 1; j < bagValues.size(); j++) {
                List<Object> aux = new ArrayList<Object>(currentTupleValues);
                aux.addAll(bagValues.get(j).getAll());
                outputTuples.add(TupleFactory.getInstance().newTuple(aux));
            }
        }

        DataBag output = BagFactory.getInstance().newDefaultBag(outputTuples);
        return output;
    }
}

这篇关于如何用Pig把(A,B,C)变成(AB,AC,BC)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆