复合键在Cassandra与猪 [英] Composite key in Cassandra with Pig

查看:147
本文介绍了复合键在Cassandra与猪的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个CQL表,看起来像这样:

We have a CQL table that looks something like this:

CREATE table data (
  occurday  text,
  seqnumber int,
  occurtimems bigint,
  unique bigint,

  fields map<text, text>,

  primary key ((occurday, seqnumber), occurtimems, unique)
)

$ c> cqlsh 像这样:

I can query this table from cqlsh like this:

select * from data where seqnumber = 10 AND occurday = '2013-10-01';

此查询工作并返回预期数据。

This query works and returns the expected data.

如果我在Pig中执行此查询作为 LOAD 的一部分,则无法正常工作。

If I execute this query as part of a LOAD from within Pig, however, things don't work.

-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27' USING CqlStorage();    

提供

InvalidRequestException(why:seqnumber cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:39567)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1625)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1611)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:591)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:621)

这些行为应该不一样吗?为什么版本通过Pig失败,直接 cqlsh 命令工作?

Shouldn't these behave the same? Why is the version through Pig failing where the straight cqlsh command works?

推荐答案

Hadoop正在使用 CqlPagingRecordReader 以尝试加载数据。这导致与您输入的查询不一致的查询。寻呼记录读取器尝试一次获取小片Cassandra数据,以避免超时。

Hadoop is using CqlPagingRecordReader to try to load your data. This is leading to queries that are not identical to what you have entered. The paging record reader is trying to obtain small slices of Cassandra data at a time to avoid timeouts.

这意味着您的查询执行为

This means that your query is executed as

SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND
token("occurday","seqnumber") <= ? AND occurday='A Great Day' 
AND seqnumber=1 LIMIT 1000 ALLOW FILTERING

这就是为什么你看到你重复的键错误。我将向Cassandra项目提交错误。

And this is why you are seeing your repeated key error. I'll submit a bug to the Cassandra Project.

Jira:
https://issues.apache.org/jira/browse/CASSANDRA-6151

这篇关于复合键在Cassandra与猪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆