CqlStorage生成错误的Pig模式 [英] CqlStorage generates wrong Pig schema
问题描述
我正在使用CqlStorage
将一些简单的数据从Cassandra加载到Pig中. CqlStorage
加载器基于Cassandra架构定义了一个架构,但似乎是错误的.
I'm loading some simple data from Cassandra into Pig using CqlStorage
. The CqlStorage
loader defines a schema based on the Cassandra schema, but it seems to be wrong.
如果我这样做:
data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;
我明白了:
data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int}
但是,如果我DUMP
data
,我会得到如下结果:
However, if I DUMP
data
, I get results like these:
((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
很明显,正如预期的那样,Cassandra的结果是键/值对.我不知道为什么CqlStorage()
生成的架构会如此不同.
Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage()
would be so different.
这真的使我在尝试访问列值时遇到问题.我尝试过天真的方法,将每个元组FLATTEN
,然后尝试以这种方式访问值:
This is really causing me problems trying to access the column values. I tried a naive approach of FLATTEN
ing each tuple, then trying to access the values that way:
flattened = FOREACH data GENERATE
FLATTEN(isbn),
FLATTEN(booktitle),
...
values = FOREACH flattened GENERATE
$1 AS ISBN,
$3 AS BookTitle,
...
当我尝试访问字段$5
时,Pig抱怨索引超出范围. (奇怪的是,flattened
认为它与原始的data
具有相同的架构.)
As soon as I try to access field $5
, Pig complains about the index being out of bounds. (Curiously, flattened
thinks it has the same schema as the original data
.)
以某种方式,CqlStorage
似乎生成了错误的模式,并且该模式一直存在到原始集合的投影中.有什么办法可以解决这个问题?
Somehow, CqlStorage
seems to be generating the wrong schema, and that schema persists to projections of the original collection. Is there any way to work around this?
(我正在使用Cassandra 1.2.8和Pig 0.11.1)
(I'm using Cassandra 1.2.8 and Pig 0.11.1)
推荐答案
此问题已得到解决, ="nofollow noreferrer"> https://issues.apache.org/jira/browse/CASSANDRA-5867 .
This was resolved for, CCE: BinSedesTuple cannot be cast to String, by Applying the fix in https://issues.apache.org/jira/browse/CASSANDRA-5867.
正如我的票中提到的吕伟伟(Alex Lui):
As Alex Lui, mentioned in my ticket:
git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
cd cassandra
git checkout cassandra-1.2
patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
ant
这篇关于CqlStorage生成错误的Pig模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!