对存储在 Hive 中的数据运行 Pig 查询 [英] Running Pig query over data stored in Hive

查看：25 发布时间：2021/11/12 4:03:32 hadoop apache-pig hive

本文介绍了对存储在 Hive 中的数据运行 Pig 查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道如何运行以 Hive 格式存储的 Pig 查询.我已将 Hive 配置为存储压缩数据(使用本教程 http://wiki.apache.org/hadoop/Hive/CompressedStorage).

I would like to know how to run Pig queries stored in Hive format. I have configured Hive to store compressed data (using this tutorial http://wiki.apache.org/hadoop/Hive/CompressedStorage).

在此之前，我只使用带有 Hive 分隔符 (^A) 的普通 Pig 加载函数.但现在 Hive 将数据存储在压缩的序列文件中.使用哪个加载函数?

Before that I used to just use normal Pig load function with Hive's delimiter (^A). But now Hive stores data in sequence files with compression. Which load function to use?

请注意，不需要像这里提到的那样紧密集成:将 Hive 与 Pig 一起使用，只是使用什么加载函数来读取 Hive 生成的压缩序列文件.

Note that don't need close integration like mentioned here: Using Hive with Pig, just what load function to use to read compressed sequence files generated by Hive.

感谢所有的回答.

推荐答案

这是我发现的:如果您将数据存储为 RCFile，则使用 HiveColumnarLoader 是有意义的.要使用它加载表，您需要先注册一些罐子:

Here's what I found out: Using HiveColumnarLoader makes sense if you store data as a RCFile. To load table using this you need to register some jars first:

register /srv/pigs/piggybank.jar
register /usr/lib/hive/lib/hive-exec-0.5.0.jar
register /usr/lib/hive/lib/hive-common-0.5.0.jar

a = LOAD '/user/hive/warehouse/table' USING org.apache.pig.piggybank.storage.HiveColumnarLoader('ts int, user_id int, url string');

要从序列文件中加载数据，您必须使用 PiggyBank(如上例所示).Piggybank 的 SequenceFile 加载器应该处理压缩文件:

To load data from Sequence file you have to use PiggyBank (as in previous example). SequenceFile loader from Piggybank should handle compressed files:

register /srv/pigs/piggybank.jar
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
a = LOAD '/user/hive/warehouse/table' USING SequenceFileLoader AS (int, int);

这不适用于 Pig 0.7，因为它无法读取 BytesWritable 类型并将其转换为 Pig 类型，并且您会收到此异常:

This doesn't work with Pig 0.7 because it's unable to read BytesWritable type and cast it to Pig type and you get this exception:

2011-07-01 10:30:08,589 WARN org.apache.pig.piggybank.storage.SequenceFileLoader: Unable to translate key class org.apache.hadoop.io.BytesWritable to a Pig datatype
2011-07-01 10:30:08,625 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class org.apache.hadoop.io.BytesWritable to a Pig datatype
    at org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78)
    at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:132)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:142)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)

这里描述了如何编译piggybank:无法建立存钱罐 ->/home/build/ivy/lib 不存在

How to compile piggybank is described here: Unable to build piggybank -> /home/build/ivy/lib does not exist

这篇关于对存储在 Hive 中的数据运行 Pig 查询的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对存储在 Hive 中的数据运行 Pig 查询 [英] Running Pig query over data stored in Hive

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

对存储在 Hive 中的数据运行 Pig 查询 [英] Running Pig query over data stored in Hive

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭