猪中过滤器匹配太多 [英] Too many filter matching in pig

查看:183
本文介绍了猪中过滤器匹配太多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个过滤器关键字列表(大约1000个数字),我需要使用这个列表来过滤一个关系的字段。



最初,我已经宣布这些关键字如:
%declare p1'。 keyword1。';
....
...



%declare p1000'。 keyword1000。';



然后我进行如下过滤:

Filtered = FITLER SRC BY(不是$ 0匹配'$ p1')和(不是$ 0匹配'$ p2')和......(不是$ 0匹配'$ p1000');

DUMP已过滤;

假设我的源关系在SRC中,我需要对第一个字段应用过滤,例如$ 0。

如果我减少过滤器到100-200,它工作正常。但是,过滤器的数量增加到1000.这是行不通的。

有人可以提出一个解决方法,以获得正确的结果吗?

提前致谢

解决方案

您可以编写一个简单的过滤器UDF,执行所有检查例如:

  package myudfs; 
import java.io.IOException;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;

public class MYFILTER extends FilterFunc
{
static List< String> filterList;
static MyFILTER(){
//载入所有过滤器
}
public Boolean exec(Tuple input)throws IOException {
if(input == null || input .size()== 0)
返回null;
try {
String str =(String)input.get(0);
return!filterList.contains(str);
catch(Exception e){
throw new IOException(捕获的异常处理输入行,e);
}
}

}


I have a list of filter keywords (about 1000 in numbers) and I need to filter a field of a relation in pig using this list.

Initially, I have declared these keywords like: %declare p1 '.keyword1.'; .... ...

%declare p1000 '.keyword1000.';

I am then doing filtering like:

Filtered= FITLER SRC BY (not $0 matches '$p1') and (not $0 matches '$p2') and ...... (not $0 matches '$p1000');

DUMP Filtered;

Assume that my source relation is in SRC and I need to apply filtering on first field i.e. $0.

If I am reducing the number of filters to 100-200, it's working fine. But as number of filters increases to 1000. It doesn't work.

Can somebody suggest a work around to get the results right?

Thanks in advance

解决方案

You can write a simple filter UDF where you'd perform all the checks something like:

 package myudfs;
 import java.io.IOException;
 import org.apache.pig.FilterFunc;
 import org.apache.pig.data.Tuple;

 public class MYFILTER extends FilterFunc
 {
    static List<String> filterList;
    static MYFILTER(){
        //load all filters
    }
    public Boolean exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try{
            String str = (String)input.get(0);
           return !filterList.contains(str);
        }catch(Exception e){
            throw new IOException("Caught exception processing input row ", e);
        }
    }

  }

这篇关于猪中过滤器匹配太多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆