如何在 Pig 中过滤 Cassandra TimeUUID/UUID [英] How to FILTER Cassandra TimeUUID/UUID in Pig

查看:29
本文介绍了如何在 Pig 中过滤 Cassandra TimeUUID/UUID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的 Cassandra 架构,使用 Datastax Enterprise

Here is my Cassandra schema, using Datastax Enterprise

CREATE KEYSPACE applications
  WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

USE applications;

CREATE TABLE events(
  bucket text, 
  id timeuuid,
  app_id uuid,  
  event text, 
  PRIMARY KEY(bucket, id)
);

我想通过 app_id (TimeUUID) 和 id (UUID) 在 PIG 中过滤,这是我的 Pig 脚本.

I want to FILTER in PIG by app_id (TimeUUID) and id (UUID), here is my Pig script.

events = LOAD 'cql://applications/events'
  USING CqlStorage()
  AS (bucket: chararray, id: chararray, app_id: chararray, event: chararray);

result = FOREACH events GENERATE bucket, id, app_id;
DESCRIBE result;
DUMP result;

结果如下

result: {bucket: chararray,id: chararray,app_id: chararray}
(2014-02-28-04,?O]??4??p??M?,;??F? (|?Mb) \n
(2014-02-28-04,?O??4??p??M?,?h^@?E????)
(2014-02-28-04,?V???4??p??M?,;??F? (|?Mb)
(2014-02-28-04,?W?0?4??p??M?,?h^@?E????)
(2014-02-28-04,?X^p?4??p??M?,?h^@?E????)

注意,app_idid 字段是二进制的,我需要通过一些 UUID 进行过滤,有什么建议吗?

Notice, the app_id, and id fields are binary and I need to filter by some UUID, any suggestions?

推荐答案

您需要使用 UDF 将 UUID/TimeUUID 的二进制字节转换为字符数组.不要试图像AS那样直接定义为chararray(bucket:chararray,id:chararray,app_id:chararray,event:chararray);

You need use a UDF to convert the binary bytes of UUID/TimeUUID to chararray. Don't try to define it as chararray directly like AS (bucket: chararray, id: chararray, app_id: chararray, event: chararray);

或者你可以使用 https://github.com/cevaris/pig-dse/blob/master/src/main/java/com/dse/pig/udfs/AbstractCassandraStorage.java 将 UUID/TimeUUID 转换为 String

Or you can use https://github.com/cevaris/pig-dse/blob/master/src/main/java/com/dse/pig/udfs/AbstractCassandraStorage.java which convert UUID/TimeUUID to String

如果您认为默认将 UUID 转换为字符串是好的,请提交 Cassandra 票证.

File a Cassandra ticket if you think it's good to convert UUID to string as default.

这篇关于如何在 Pig 中过滤 Cassandra TimeUUID/UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆