CqlStorage生成错误的Pig模式 [英] CqlStorage generates wrong Pig schema

查看:92
本文介绍了CqlStorage生成错误的Pig模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CqlStorage将一些简单的数据从Cassandra加载到Pig中. CqlStorage加载器基于Cassandra架构定义了一个架构,但似乎是错误的.

I'm loading some simple data from Cassandra into Pig using CqlStorage. The CqlStorage loader defines a schema based on the Cassandra schema, but it seems to be wrong.

如果我这样做:

data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;

我明白了:

data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int}

但是,如果我DUMP data,我会得到如下结果:

However, if I DUMP data, I get results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

很明显,正如预期的那样,Cassandra的结果是键/值对.我不知道为什么CqlStorage()生成的架构会如此不同.

Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage() would be so different.

这真的使我在尝试访问列值时遇到问题.我尝试过天真的方法,将每个元组FLATTEN,然后尝试以这种方式访问​​值:

This is really causing me problems trying to access the column values. I tried a naive approach of FLATTENing each tuple, then trying to access the values that way:

flattened = FOREACH data GENERATE
  FLATTEN(isbn),
  FLATTEN(booktitle),
  ...
values = FOREACH flattened GENERATE
  $1 AS ISBN,
  $3 AS BookTitle,
  ...

当我尝试访问字段$5时,Pig抱怨索引超出范围. (奇怪的是,flattened认为它与原始的data具有相同的架构.)

As soon as I try to access field $5, Pig complains about the index being out of bounds. (Curiously, flattened thinks it has the same schema as the original data.)

以某种方式,CqlStorage似乎生成了错误的模式,并且该模式一直存在到原始集合的投影中.有什么办法可以解决这个问题?

Somehow, CqlStorage seems to be generating the wrong schema, and that schema persists to projections of the original collection. Is there any way to work around this?

(我正在使用Cassandra 1.2.8和Pig 0.11.1)

(I'm using Cassandra 1.2.8 and Pig 0.11.1)

推荐答案

此问题已得到解决, ="nofollow noreferrer"> https://issues.apache.org/jira/browse/CASSANDRA-5867 .

This was resolved for, CCE: BinSedesTuple cannot be cast to String, by Applying the fix in https://issues.apache.org/jira/browse/CASSANDRA-5867.

正如我的票中提到的吕伟伟(Alex Lui):

As Alex Lui, mentioned in my ticket:

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
cd cassandra
git checkout cassandra-1.2
patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
ant

这篇关于CqlStorage生成错误的Pig模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆